linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/10] perf: arm_spe: Armv8.8 SPE features
@ 2025-06-05 10:48 James Clark
  2025-06-05 10:48 ` [PATCH v3 01/10] arm64: sysreg: Add new PMSFCR_EL1 fields and PMSDSFR_EL1 register James Clark
                   ` (10 more replies)
  0 siblings, 11 replies; 37+ messages in thread
From: James Clark @ 2025-06-05 10:48 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, leo.yan
  Cc: linux-arm-kernel, linux-kernel, linux-perf-users, linux-doc,
	kvmarm, James Clark

Support 3 new SPE features: FEAT_SPEv1p4 filters, FEAT_SPE_EFT extended
filtering, and SPE_FEAT_FDS data source filtering. The features are
independent can be applied separately:

  * Prerequisite sysreg changes - patches 1 - 2
  * FEAT_SPEv1p4 - patch 3
  * FEAT_SPE_EFT - patch 4
  * FEAT_SPE_FDS - patches 5 - 8
  * FEAT_SPE_FDS Perf tool changes - patches 9 - 11

The first two features will work with old Perfs but a Perf change to
parse the new config4 is required for the last feature.

To: 

---
Changes in v3:
- Use PMSIDR_EL1_FDS instead of 1 << PMSIDR_EL1_FDS_SHIFT
- Add VNCR offsets
- Link to v2: https://lore.kernel.org/r/20250529-james-perf-feat_spe_eft-v2-0-a01a9baad06a@linaro.org

Changes in v2:
- Fix detection of FEAT_SPE_FDS in el2_setup.h
- Pickup Marc Z's sysreg change instead which matches the json
- Restructure and expand docs changes
- Link to v1: https://lore.kernel.org/r/20250506-james-perf-feat_spe_eft-v1-0-dd480e8e4851@linaro.org

---
James Clark (10):
      arm64: sysreg: Add new PMSFCR_EL1 fields and PMSDSFR_EL1 register
      perf: arm_spe: Support FEAT_SPEv1p4 filters
      perf: arm_spe: Add support for FEAT_SPE_EFT extended filtering
      arm64/boot: Enable EL2 requirements for SPE_FEAT_FDS
      KVM: arm64: Add trap configs for PMSDSFR_EL1
      perf: Add perf_event_attr::config4
      perf: arm_spe: Add support for filtering on data source
      tools headers UAPI: Sync linux/perf_event.h with the kernel sources
      perf tools: Add support for perf_event_attr::config4
      perf docs: arm-spe: Document new SPE filtering features

 Documentation/arch/arm64/booting.rst      |  11 ++++
 arch/arm64/include/asm/el2_setup.h        |  14 +++++
 arch/arm64/include/asm/sysreg.h           |   7 +++
 arch/arm64/include/asm/vncr_mapping.h     |   2 +
 arch/arm64/kvm/emulate-nested.c           |   1 +
 arch/arm64/kvm/sys_regs.c                 |   1 +
 arch/arm64/tools/sysreg                   |  13 +++-
 drivers/perf/arm_spe_pmu.c                | 100 +++++++++++++++++++++++++++++-
 include/uapi/linux/perf_event.h           |   2 +
 tools/include/uapi/linux/perf_event.h     |   2 +
 tools/perf/Documentation/perf-arm-spe.txt |  97 ++++++++++++++++++++++++++---
 tools/perf/tests/parse-events.c           |  14 ++++-
 tools/perf/util/parse-events.c            |  11 ++++
 tools/perf/util/parse-events.h            |   1 +
 tools/perf/util/parse-events.l            |   1 +
 tools/perf/util/pmu.c                     |   8 +++
 tools/perf/util/pmu.h                     |   1 +
 17 files changed, 273 insertions(+), 13 deletions(-)
---
base-commit: ec7714e4947909190ffb3041a03311a975350fe0
change-id: 20250312-james-perf-feat_spe_eft-66cdf4d8fe99

Best regards,
-- 
James Clark <james.clark@linaro.org>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v3 01/10] arm64: sysreg: Add new PMSFCR_EL1 fields and PMSDSFR_EL1 register
  2025-06-05 10:48 [PATCH v3 00/10] perf: arm_spe: Armv8.8 SPE features James Clark
@ 2025-06-05 10:48 ` James Clark
  2025-06-12  6:44   ` Anshuman Khandual
  2025-07-14 13:32   ` Will Deacon
  2025-06-05 10:49 ` [PATCH v3 02/10] perf: arm_spe: Support FEAT_SPEv1p4 filters James Clark
                   ` (9 subsequent siblings)
  10 siblings, 2 replies; 37+ messages in thread
From: James Clark @ 2025-06-05 10:48 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, leo.yan
  Cc: linux-arm-kernel, linux-kernel, linux-perf-users, linux-doc,
	kvmarm, James Clark

Add new fields and register that are introduced for the features
FEAT_SPE_EFT (extended filtering) and FEAT_SPE_FDS (data source
filtering).

Tested-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
---
 arch/arm64/tools/sysreg | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg
index 8a8cf6874298..2e897d8a4040 100644
--- a/arch/arm64/tools/sysreg
+++ b/arch/arm64/tools/sysreg
@@ -2859,11 +2859,20 @@ Field	0	RND
 EndSysreg
 
 Sysreg	PMSFCR_EL1	3	0	9	9	4
-Res0	63:19
+Res0	63:53
+Field	52	SIMDm
+Field	51	FPm
+Field	50	STm
+Field	49	LDm
+Field	48	Bm
+Res0	47:21
+Field	20	SIMD
+Field	19	FP
 Field	18	ST
 Field	17	LD
 Field	16	B
-Res0	15:4
+Res0	15:5
+Field	4	FDS
 Field	3	FnE
 Field	2	FL
 Field	1	FT

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 02/10] perf: arm_spe: Support FEAT_SPEv1p4 filters
  2025-06-05 10:48 [PATCH v3 00/10] perf: arm_spe: Armv8.8 SPE features James Clark
  2025-06-05 10:48 ` [PATCH v3 01/10] arm64: sysreg: Add new PMSFCR_EL1 fields and PMSDSFR_EL1 register James Clark
@ 2025-06-05 10:49 ` James Clark
  2025-06-12  7:35   ` Anshuman Khandual
  2025-07-14 13:26   ` Will Deacon
  2025-06-05 10:49 ` [PATCH v3 03/10] perf: arm_spe: Add support for FEAT_SPE_EFT extended filtering James Clark
                   ` (8 subsequent siblings)
  10 siblings, 2 replies; 37+ messages in thread
From: James Clark @ 2025-06-05 10:49 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, leo.yan
  Cc: linux-arm-kernel, linux-kernel, linux-perf-users, linux-doc,
	kvmarm, James Clark

FEAT_SPEv1p4 (optional from Armv8.8) adds some new filter bits, so
remove them from the previous version's RES0 bits using
PMSEVFR_EL1_RES0_V1P4_EXCL. It also makes some previously available bits
unavailable again, so add those back using PMSEVFR_EL1_RES0_V1P4_INCL.
E.g:

  E[30], bit [30]
  When FEAT_SPEv1p4 is _not_ implemented ...

FEAT_SPE_V1P3 has the same filters as V1P2 so explicitly add it to the
switch.

Reviewed-by: Leo Yan <leo.yan@arm.com>
Tested-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
---
 arch/arm64/include/asm/sysreg.h | 7 +++++++
 drivers/perf/arm_spe_pmu.c      | 5 ++++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index f1bb0d10c39a..880090df3efc 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -358,6 +358,13 @@
 	(PMSEVFR_EL1_RES0_IMP & ~(BIT_ULL(18) | BIT_ULL(17) | BIT_ULL(11)))
 #define PMSEVFR_EL1_RES0_V1P2	\
 	(PMSEVFR_EL1_RES0_V1P1 & ~BIT_ULL(6))
+#define PMSEVFR_EL1_RES0_V1P4_EXCL \
+	(BIT_ULL(2) | BIT_ULL(4) | GENMASK_ULL(10, 8) | GENMASK_ULL(23, 19))
+#define PMSEVFR_EL1_RES0_V1P4_INCL \
+	(GENMASK_ULL(31, 26))
+#define PMSEVFR_EL1_RES0_V1P4	\
+	(PMSEVFR_EL1_RES0_V1P4_INCL | \
+	(PMSEVFR_EL1_RES0_V1P2 & ~PMSEVFR_EL1_RES0_V1P4_EXCL))
 
 /* Buffer error reporting */
 #define PMBSR_EL1_FAULT_FSC_SHIFT	PMBSR_EL1_MSS_SHIFT
diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index 3efed8839a4e..d9f6d229dce8 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -701,9 +701,12 @@ static u64 arm_spe_pmsevfr_res0(u16 pmsver)
 	case ID_AA64DFR0_EL1_PMSVer_V1P1:
 		return PMSEVFR_EL1_RES0_V1P1;
 	case ID_AA64DFR0_EL1_PMSVer_V1P2:
+	case ID_AA64DFR0_EL1_PMSVer_V1P3:
+		return PMSEVFR_EL1_RES0_V1P2;
+	case ID_AA64DFR0_EL1_PMSVer_V1P4:
 	/* Return the highest version we support in default */
 	default:
-		return PMSEVFR_EL1_RES0_V1P2;
+		return PMSEVFR_EL1_RES0_V1P4;
 	}
 }
 

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 03/10] perf: arm_spe: Add support for FEAT_SPE_EFT extended filtering
  2025-06-05 10:48 [PATCH v3 00/10] perf: arm_spe: Armv8.8 SPE features James Clark
  2025-06-05 10:48 ` [PATCH v3 01/10] arm64: sysreg: Add new PMSFCR_EL1 fields and PMSDSFR_EL1 register James Clark
  2025-06-05 10:49 ` [PATCH v3 02/10] perf: arm_spe: Support FEAT_SPEv1p4 filters James Clark
@ 2025-06-05 10:49 ` James Clark
  2025-07-14 13:46   ` Will Deacon
  2025-06-05 10:49 ` [PATCH v3 04/10] arm64/boot: Enable EL2 requirements for SPE_FEAT_FDS James Clark
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 37+ messages in thread
From: James Clark @ 2025-06-05 10:49 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, leo.yan
  Cc: linux-arm-kernel, linux-kernel, linux-perf-users, linux-doc,
	kvmarm, James Clark

FEAT_SPE_EFT (optional from Armv9.4) adds mask bits for the existing
load, store and branch filters. It also adds two new filter bits for
SIMD and floating point with their own associated mask bits. The current
filters only allow OR filtering on samples that are load OR store etc,
and the new mask bits allow setting part of the filter to an AND, for
example filtering samples that are store AND SIMD. With mask bits set to
0, the OR behavior is preserved, so the unless any masks are explicitly
set old filters will behave the same.

Add them all and make them behave the same way as existing format bits,
hidden and return EOPNOTSUPP if set when the feature doesn't exist.

Reviewed-by: Leo Yan <leo.yan@arm.com>
Tested-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
---
 drivers/perf/arm_spe_pmu.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)

diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index d9f6d229dce8..9309b846f642 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -86,6 +86,7 @@ struct arm_spe_pmu {
 #define SPE_PMU_FEAT_ERND			(1UL << 5)
 #define SPE_PMU_FEAT_INV_FILT_EVT		(1UL << 6)
 #define SPE_PMU_FEAT_DISCARD			(1UL << 7)
+#define SPE_PMU_FEAT_EFT			(1UL << 8)
 #define SPE_PMU_FEAT_DEV_PROBED			(1UL << 63)
 	u64					features;
 
@@ -197,6 +198,27 @@ static const struct attribute_group arm_spe_pmu_cap_group = {
 #define ATTR_CFG_FLD_discard_CFG		config	/* PMBLIMITR_EL1.FM = DISCARD */
 #define ATTR_CFG_FLD_discard_LO			35
 #define ATTR_CFG_FLD_discard_HI			35
+#define ATTR_CFG_FLD_branch_filter_mask_CFG	config	/* PMSFCR_EL1.Bm */
+#define ATTR_CFG_FLD_branch_filter_mask_LO	36
+#define ATTR_CFG_FLD_branch_filter_mask_HI	36
+#define ATTR_CFG_FLD_load_filter_mask_CFG	config	/* PMSFCR_EL1.LDm */
+#define ATTR_CFG_FLD_load_filter_mask_LO	37
+#define ATTR_CFG_FLD_load_filter_mask_HI	37
+#define ATTR_CFG_FLD_store_filter_mask_CFG	config	/* PMSFCR_EL1.STm */
+#define ATTR_CFG_FLD_store_filter_mask_LO	38
+#define ATTR_CFG_FLD_store_filter_mask_HI	38
+#define ATTR_CFG_FLD_simd_filter_CFG		config	/* PMSFCR_EL1.SIMD */
+#define ATTR_CFG_FLD_simd_filter_LO		39
+#define ATTR_CFG_FLD_simd_filter_HI		39
+#define ATTR_CFG_FLD_simd_filter_mask_CFG	config	/* PMSFCR_EL1.SIMDm */
+#define ATTR_CFG_FLD_simd_filter_mask_LO	40
+#define ATTR_CFG_FLD_simd_filter_mask_HI	40
+#define ATTR_CFG_FLD_float_filter_CFG		config	/* PMSFCR_EL1.FP */
+#define ATTR_CFG_FLD_float_filter_LO		41
+#define ATTR_CFG_FLD_float_filter_HI		41
+#define ATTR_CFG_FLD_float_filter_mask_CFG	config	/* PMSFCR_EL1.FPm */
+#define ATTR_CFG_FLD_float_filter_mask_LO	42
+#define ATTR_CFG_FLD_float_filter_mask_HI	42
 
 #define ATTR_CFG_FLD_event_filter_CFG		config1	/* PMSEVFR_EL1 */
 #define ATTR_CFG_FLD_event_filter_LO		0
@@ -215,8 +237,15 @@ GEN_PMU_FORMAT_ATTR(pa_enable);
 GEN_PMU_FORMAT_ATTR(pct_enable);
 GEN_PMU_FORMAT_ATTR(jitter);
 GEN_PMU_FORMAT_ATTR(branch_filter);
+GEN_PMU_FORMAT_ATTR(branch_filter_mask);
 GEN_PMU_FORMAT_ATTR(load_filter);
+GEN_PMU_FORMAT_ATTR(load_filter_mask);
 GEN_PMU_FORMAT_ATTR(store_filter);
+GEN_PMU_FORMAT_ATTR(store_filter_mask);
+GEN_PMU_FORMAT_ATTR(simd_filter);
+GEN_PMU_FORMAT_ATTR(simd_filter_mask);
+GEN_PMU_FORMAT_ATTR(float_filter);
+GEN_PMU_FORMAT_ATTR(float_filter_mask);
 GEN_PMU_FORMAT_ATTR(event_filter);
 GEN_PMU_FORMAT_ATTR(inv_event_filter);
 GEN_PMU_FORMAT_ATTR(min_latency);
@@ -228,8 +257,15 @@ static struct attribute *arm_spe_pmu_formats_attr[] = {
 	&format_attr_pct_enable.attr,
 	&format_attr_jitter.attr,
 	&format_attr_branch_filter.attr,
+	&format_attr_branch_filter_mask.attr,
 	&format_attr_load_filter.attr,
+	&format_attr_load_filter_mask.attr,
 	&format_attr_store_filter.attr,
+	&format_attr_store_filter_mask.attr,
+	&format_attr_simd_filter.attr,
+	&format_attr_simd_filter_mask.attr,
+	&format_attr_float_filter.attr,
+	&format_attr_float_filter_mask.attr,
 	&format_attr_event_filter.attr,
 	&format_attr_inv_event_filter.attr,
 	&format_attr_min_latency.attr,
@@ -250,6 +286,16 @@ static umode_t arm_spe_pmu_format_attr_is_visible(struct kobject *kobj,
 	if (attr == &format_attr_inv_event_filter.attr && !(spe_pmu->features & SPE_PMU_FEAT_INV_FILT_EVT))
 		return 0;
 
+	if ((attr == &format_attr_branch_filter_mask.attr ||
+	     attr == &format_attr_load_filter_mask.attr ||
+	     attr == &format_attr_store_filter_mask.attr ||
+	     attr == &format_attr_simd_filter.attr ||
+	     attr == &format_attr_simd_filter_mask.attr ||
+	     attr == &format_attr_float_filter.attr ||
+	     attr == &format_attr_float_filter_mask.attr) &&
+	     !(spe_pmu->features & SPE_PMU_FEAT_EFT))
+		return 0;
+
 	return attr->mode;
 }
 
@@ -341,8 +387,15 @@ static u64 arm_spe_event_to_pmsfcr(struct perf_event *event)
 	u64 reg = 0;
 
 	reg |= FIELD_PREP(PMSFCR_EL1_LD, ATTR_CFG_GET_FLD(attr, load_filter));
+	reg |= FIELD_PREP(PMSFCR_EL1_LDm, ATTR_CFG_GET_FLD(attr, load_filter_mask));
 	reg |= FIELD_PREP(PMSFCR_EL1_ST, ATTR_CFG_GET_FLD(attr, store_filter));
+	reg |= FIELD_PREP(PMSFCR_EL1_STm, ATTR_CFG_GET_FLD(attr, store_filter_mask));
 	reg |= FIELD_PREP(PMSFCR_EL1_B, ATTR_CFG_GET_FLD(attr, branch_filter));
+	reg |= FIELD_PREP(PMSFCR_EL1_Bm, ATTR_CFG_GET_FLD(attr, branch_filter_mask));
+	reg |= FIELD_PREP(PMSFCR_EL1_SIMD, ATTR_CFG_GET_FLD(attr, simd_filter));
+	reg |= FIELD_PREP(PMSFCR_EL1_SIMDm, ATTR_CFG_GET_FLD(attr, simd_filter_mask));
+	reg |= FIELD_PREP(PMSFCR_EL1_FP, ATTR_CFG_GET_FLD(attr, float_filter));
+	reg |= FIELD_PREP(PMSFCR_EL1_FPm, ATTR_CFG_GET_FLD(attr, float_filter_mask));
 
 	if (reg)
 		reg |= PMSFCR_EL1_FT;
@@ -716,6 +769,10 @@ static int arm_spe_pmu_event_init(struct perf_event *event)
 	u64 reg;
 	struct perf_event_attr *attr = &event->attr;
 	struct arm_spe_pmu *spe_pmu = to_spe_pmu(event->pmu);
+	const u64 feat_spe_eft_bits = PMSFCR_EL1_LDm | PMSFCR_EL1_STm |
+				      PMSFCR_EL1_Bm | PMSFCR_EL1_SIMD |
+				      PMSFCR_EL1_SIMDm | PMSFCR_EL1_FP |
+				      PMSFCR_EL1_FPm;
 
 	/* This is, of course, deeply driver-specific */
 	if (attr->type != event->pmu->type)
@@ -761,6 +818,10 @@ static int arm_spe_pmu_event_init(struct perf_event *event)
 	    !(spe_pmu->features & SPE_PMU_FEAT_FILT_LAT))
 		return -EOPNOTSUPP;
 
+	if ((reg & feat_spe_eft_bits) &&
+	    !(spe_pmu->features & SPE_PMU_FEAT_EFT))
+		return -EOPNOTSUPP;
+
 	if (ATTR_CFG_GET_FLD(&event->attr, discard) &&
 	    !(spe_pmu->features & SPE_PMU_FEAT_DISCARD))
 		return -EOPNOTSUPP;
@@ -1052,6 +1113,9 @@ static void __arm_spe_pmu_dev_probe(void *info)
 	if (spe_pmu->pmsver >= ID_AA64DFR0_EL1_PMSVer_V1P2)
 		spe_pmu->features |= SPE_PMU_FEAT_DISCARD;
 
+	if (FIELD_GET(PMSIDR_EL1_EFT, reg))
+		spe_pmu->features |= SPE_PMU_FEAT_EFT;
+
 	/* This field has a spaced out encoding, so just use a look-up */
 	fld = FIELD_GET(PMSIDR_EL1_INTERVAL, reg);
 	switch (fld) {

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 04/10] arm64/boot: Enable EL2 requirements for SPE_FEAT_FDS
  2025-06-05 10:48 [PATCH v3 00/10] perf: arm_spe: Armv8.8 SPE features James Clark
                   ` (2 preceding siblings ...)
  2025-06-05 10:49 ` [PATCH v3 03/10] perf: arm_spe: Add support for FEAT_SPE_EFT extended filtering James Clark
@ 2025-06-05 10:49 ` James Clark
  2025-07-14 13:54   ` Will Deacon
  2025-06-05 10:49 ` [PATCH v3 05/10] KVM: arm64: Add trap configs for PMSDSFR_EL1 James Clark
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 37+ messages in thread
From: James Clark @ 2025-06-05 10:49 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, leo.yan
  Cc: linux-arm-kernel, linux-kernel, linux-perf-users, linux-doc,
	kvmarm, James Clark

SPE data source filtering (optional from Armv8.8) requires that traps to
the filter register PMSDSFR be disabled. Document the requirements and
disable the traps if the feature is present.

Tested-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
---
 Documentation/arch/arm64/booting.rst | 11 +++++++++++
 arch/arm64/include/asm/el2_setup.h   | 14 ++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/Documentation/arch/arm64/booting.rst b/Documentation/arch/arm64/booting.rst
index dee7b6de864f..abd75085a239 100644
--- a/Documentation/arch/arm64/booting.rst
+++ b/Documentation/arch/arm64/booting.rst
@@ -404,6 +404,17 @@ Before jumping into the kernel, the following conditions must be met:
     - HDFGWTR2_EL2.nPMICFILTR_EL0 (bit 3) must be initialised to 0b1.
     - HDFGWTR2_EL2.nPMUACR_EL1 (bit 4) must be initialised to 0b1.
 
+  For CPUs with SPE data source filtering (FEAT_SPE_FDS):
+
+  - If EL3 is present:
+
+    - MDCR_EL3.EnPMS3 (bit 42) must be initialised to 0b1.
+
+  - If the kernel is entered at EL1 and EL2 is present:
+
+    - HDFGRTR2_EL2.nPMSDSFR_EL1 (bit 19) must be initialised to 0b1.
+    - HDFGWTR2_EL2.nPMSDSFR_EL1 (bit 19) must be initialised to 0b1.
+
   For CPUs with Memory Copy and Memory Set instructions (FEAT_MOPS):
 
   - If the kernel is entered at EL1 and EL2 is present:
diff --git a/arch/arm64/include/asm/el2_setup.h b/arch/arm64/include/asm/el2_setup.h
index 1e7c7475e43f..02b4a7fc016e 100644
--- a/arch/arm64/include/asm/el2_setup.h
+++ b/arch/arm64/include/asm/el2_setup.h
@@ -279,6 +279,20 @@
 	orr	x0, x0, #HDFGRTR2_EL2_nPMICFILTR_EL0
 	orr	x0, x0, #HDFGRTR2_EL2_nPMUACR_EL1
 .Lskip_pmuv3p9_\@:
+	mrs	x1, id_aa64dfr0_el1
+	ubfx	x1, x1, #ID_AA64DFR0_EL1_PMSVer_SHIFT, #4
+	/* If SPE is implemented, */
+	cmp	x1, #ID_AA64DFR0_EL1_PMSVer_IMP
+	b.lt	.Lskip_spefds_\@
+	/* we can read PMSIDR and */
+	mrs_s	x1, SYS_PMSIDR_EL1
+	and	x1, x1,  #PMSIDR_EL1_FDS
+	/* if FEAT_SPE_FDS is implemented, */
+	cbz	x1, .Lskip_spefds_\@
+	/* disable traps to PMSDSFR. */
+	orr	x0, x0, #HDFGRTR2_EL2_nPMSDSFR_EL1
+
+.Lskip_spefds_\@:
 	msr_s   SYS_HDFGRTR2_EL2, x0
 	msr_s   SYS_HDFGWTR2_EL2, x0
 	msr_s   SYS_HFGRTR2_EL2, xzr

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 05/10] KVM: arm64: Add trap configs for PMSDSFR_EL1
  2025-06-05 10:48 [PATCH v3 00/10] perf: arm_spe: Armv8.8 SPE features James Clark
                   ` (3 preceding siblings ...)
  2025-06-05 10:49 ` [PATCH v3 04/10] arm64/boot: Enable EL2 requirements for SPE_FEAT_FDS James Clark
@ 2025-06-05 10:49 ` James Clark
  2025-07-09  9:53   ` Joey Gouly
  2025-06-05 10:49 ` [PATCH v3 06/10] perf: Add perf_event_attr::config4 James Clark
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 37+ messages in thread
From: James Clark @ 2025-06-05 10:49 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, leo.yan
  Cc: linux-arm-kernel, linux-kernel, linux-perf-users, linux-doc,
	kvmarm, James Clark

SPE data source filtering (SPE_FEAT_FDS) adds a new register
PMSDSFR_EL1, add the trap configs for it. PMSNEVFR_EL1 was also missing
its VNCR offset so add it along with PMSDSFR_EL1.

Tested-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
---
 arch/arm64/include/asm/vncr_mapping.h | 2 ++
 arch/arm64/kvm/emulate-nested.c       | 1 +
 arch/arm64/kvm/sys_regs.c             | 1 +
 3 files changed, 4 insertions(+)

diff --git a/arch/arm64/include/asm/vncr_mapping.h b/arch/arm64/include/asm/vncr_mapping.h
index 6f556e993644..dba0e58a5fac 100644
--- a/arch/arm64/include/asm/vncr_mapping.h
+++ b/arch/arm64/include/asm/vncr_mapping.h
@@ -92,6 +92,8 @@
 #define VNCR_PMSICR_EL1         0x838
 #define VNCR_PMSIRR_EL1         0x840
 #define VNCR_PMSLATFR_EL1       0x848
+#define VNCR_PMSNEVFR_EL1       0x850
+#define VNCR_PMSDSFR_EL1        0x858
 #define VNCR_TRFCR_EL1          0x880
 #define VNCR_MPAM1_EL1          0x900
 #define VNCR_MPAMHCR_EL2        0x930
diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
index 3a384e9660b8..60bd8b7f0e5b 100644
--- a/arch/arm64/kvm/emulate-nested.c
+++ b/arch/arm64/kvm/emulate-nested.c
@@ -1174,6 +1174,7 @@ static const struct encoding_to_trap_config encoding_to_cgt[] __initconst = {
 	SR_TRAP(SYS_PMSIRR_EL1,		CGT_MDCR_TPMS),
 	SR_TRAP(SYS_PMSLATFR_EL1,	CGT_MDCR_TPMS),
 	SR_TRAP(SYS_PMSNEVFR_EL1,	CGT_MDCR_TPMS),
+	SR_TRAP(SYS_PMSDSFR_EL1,	CGT_MDCR_TPMS),
 	SR_TRAP(SYS_TRFCR_EL1,		CGT_MDCR_TTRF),
 	SR_TRAP(SYS_TRBBASER_EL1,	CGT_MDCR_E2TB),
 	SR_TRAP(SYS_TRBLIMITR_EL1,	CGT_MDCR_E2TB),
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index a6cf2888d150..4a88ba15c7df 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -3008,6 +3008,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_PMBLIMITR_EL1), undef_access },
 	{ SYS_DESC(SYS_PMBPTR_EL1), undef_access },
 	{ SYS_DESC(SYS_PMBSR_EL1), undef_access },
+	{ SYS_DESC(SYS_PMSDSFR_EL1), undef_access },
 	/* PMBIDR_EL1 is not trapped */
 
 	{ PMU_SYS_REG(PMINTENSET_EL1),

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 06/10] perf: Add perf_event_attr::config4
  2025-06-05 10:48 [PATCH v3 00/10] perf: arm_spe: Armv8.8 SPE features James Clark
                   ` (4 preceding siblings ...)
  2025-06-05 10:49 ` [PATCH v3 05/10] KVM: arm64: Add trap configs for PMSDSFR_EL1 James Clark
@ 2025-06-05 10:49 ` James Clark
  2025-06-30 15:35   ` Ian Rogers
  2025-07-14 13:56   ` Will Deacon
  2025-06-05 10:49 ` [PATCH v3 07/10] perf: arm_spe: Add support for filtering on data source James Clark
                   ` (4 subsequent siblings)
  10 siblings, 2 replies; 37+ messages in thread
From: James Clark @ 2025-06-05 10:49 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, leo.yan
  Cc: linux-arm-kernel, linux-kernel, linux-perf-users, linux-doc,
	kvmarm, James Clark

Arm FEAT_SPE_FDS adds the ability to filter on the data source of a
packet using another 64-bits of event filtering control. As the existing
perf_event_attr::configN fields are all used up for SPE PMU, an
additional field is needed. Add a new 'config4' field.

Reviewed-by: Leo Yan <leo.yan@arm.com>
Tested-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
---
 include/uapi/linux/perf_event.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 78a362b80027..0d0ed85ad8cb 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -382,6 +382,7 @@ enum perf_event_read_format {
 #define PERF_ATTR_SIZE_VER6			120	/* Add: aux_sample_size */
 #define PERF_ATTR_SIZE_VER7			128	/* Add: sig_data */
 #define PERF_ATTR_SIZE_VER8			136	/* Add: config3 */
+#define PERF_ATTR_SIZE_VER9			144	/* add: config4 */
 
 /*
  * 'struct perf_event_attr' contains various attributes that define
@@ -543,6 +544,7 @@ struct perf_event_attr {
 	__u64	sig_data;
 
 	__u64	config3; /* extension of config2 */
+	__u64	config4; /* extension of config3 */
 };
 
 /*

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 07/10] perf: arm_spe: Add support for filtering on data source
  2025-06-05 10:48 [PATCH v3 00/10] perf: arm_spe: Armv8.8 SPE features James Clark
                   ` (5 preceding siblings ...)
  2025-06-05 10:49 ` [PATCH v3 06/10] perf: Add perf_event_attr::config4 James Clark
@ 2025-06-05 10:49 ` James Clark
  2025-07-14 14:04   ` Will Deacon
  2025-06-05 10:49 ` [PATCH v3 08/10] tools headers UAPI: Sync linux/perf_event.h with the kernel sources James Clark
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 37+ messages in thread
From: James Clark @ 2025-06-05 10:49 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, leo.yan
  Cc: linux-arm-kernel, linux-kernel, linux-perf-users, linux-doc,
	kvmarm, James Clark

SPE_FEAT_FDS adds the ability to filter on the data source of packets.
Like the other existing filters, enable filtering with PMSFCR_EL1.FDS
when any of the filter bits are set.

Each bit maps to data sources 0-63 described by bits[0:5] in the data
source packet (although the full range of data source is 16 bits so
higher value data sources can't be filtered on). The filter is an OR of
all the bits, so for example setting bits 0 and 3 filters packets from
data sources 0 OR 3.

Reviewed-by: Leo Yan <leo.yan@arm.com>
Tested-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
---
 drivers/perf/arm_spe_pmu.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index 9309b846f642..d04318411f77 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -87,6 +87,7 @@ struct arm_spe_pmu {
 #define SPE_PMU_FEAT_INV_FILT_EVT		(1UL << 6)
 #define SPE_PMU_FEAT_DISCARD			(1UL << 7)
 #define SPE_PMU_FEAT_EFT			(1UL << 8)
+#define SPE_PMU_FEAT_FDS			(1UL << 9)
 #define SPE_PMU_FEAT_DEV_PROBED			(1UL << 63)
 	u64					features;
 
@@ -232,6 +233,10 @@ static const struct attribute_group arm_spe_pmu_cap_group = {
 #define ATTR_CFG_FLD_inv_event_filter_LO	0
 #define ATTR_CFG_FLD_inv_event_filter_HI	63
 
+#define ATTR_CFG_FLD_data_src_filter_CFG	config4	/* PMSDSFR_EL1 */
+#define ATTR_CFG_FLD_data_src_filter_LO	0
+#define ATTR_CFG_FLD_data_src_filter_HI	63
+
 GEN_PMU_FORMAT_ATTR(ts_enable);
 GEN_PMU_FORMAT_ATTR(pa_enable);
 GEN_PMU_FORMAT_ATTR(pct_enable);
@@ -248,6 +253,7 @@ GEN_PMU_FORMAT_ATTR(float_filter);
 GEN_PMU_FORMAT_ATTR(float_filter_mask);
 GEN_PMU_FORMAT_ATTR(event_filter);
 GEN_PMU_FORMAT_ATTR(inv_event_filter);
+GEN_PMU_FORMAT_ATTR(data_src_filter);
 GEN_PMU_FORMAT_ATTR(min_latency);
 GEN_PMU_FORMAT_ATTR(discard);
 
@@ -268,6 +274,7 @@ static struct attribute *arm_spe_pmu_formats_attr[] = {
 	&format_attr_float_filter_mask.attr,
 	&format_attr_event_filter.attr,
 	&format_attr_inv_event_filter.attr,
+	&format_attr_data_src_filter.attr,
 	&format_attr_min_latency.attr,
 	&format_attr_discard.attr,
 	NULL,
@@ -286,6 +293,9 @@ static umode_t arm_spe_pmu_format_attr_is_visible(struct kobject *kobj,
 	if (attr == &format_attr_inv_event_filter.attr && !(spe_pmu->features & SPE_PMU_FEAT_INV_FILT_EVT))
 		return 0;
 
+	if (attr == &format_attr_data_src_filter.attr && !(spe_pmu->features & SPE_PMU_FEAT_FDS))
+		return 0;
+
 	if ((attr == &format_attr_branch_filter_mask.attr ||
 	     attr == &format_attr_load_filter_mask.attr ||
 	     attr == &format_attr_store_filter_mask.attr ||
@@ -406,6 +416,9 @@ static u64 arm_spe_event_to_pmsfcr(struct perf_event *event)
 	if (ATTR_CFG_GET_FLD(attr, inv_event_filter))
 		reg |= PMSFCR_EL1_FnE;
 
+	if (ATTR_CFG_GET_FLD(attr, data_src_filter))
+		reg |= PMSFCR_EL1_FDS;
+
 	if (ATTR_CFG_GET_FLD(attr, min_latency))
 		reg |= PMSFCR_EL1_FL;
 
@@ -430,6 +443,12 @@ static u64 arm_spe_event_to_pmslatfr(struct perf_event *event)
 	return FIELD_PREP(PMSLATFR_EL1_MINLAT, ATTR_CFG_GET_FLD(attr, min_latency));
 }
 
+static u64 arm_spe_event_to_pmsdsfr(struct perf_event *event)
+{
+	struct perf_event_attr *attr = &event->attr;
+	return ATTR_CFG_GET_FLD(attr, data_src_filter);
+}
+
 static void arm_spe_pmu_pad_buf(struct perf_output_handle *handle, int len)
 {
 	struct arm_spe_pmu_buf *buf = perf_get_aux(handle);
@@ -788,6 +807,10 @@ static int arm_spe_pmu_event_init(struct perf_event *event)
 	if (arm_spe_event_to_pmsnevfr(event) & arm_spe_pmsevfr_res0(spe_pmu->pmsver))
 		return -EOPNOTSUPP;
 
+	if (arm_spe_event_to_pmsdsfr(event) &&
+	    !(spe_pmu->features & SPE_PMU_FEAT_FDS))
+		return -EOPNOTSUPP;
+
 	if (attr->exclude_idle)
 		return -EOPNOTSUPP;
 
@@ -857,6 +880,11 @@ static void arm_spe_pmu_start(struct perf_event *event, int flags)
 		write_sysreg_s(reg, SYS_PMSNEVFR_EL1);
 	}
 
+	if (spe_pmu->features & SPE_PMU_FEAT_FDS) {
+		reg = arm_spe_event_to_pmsdsfr(event);
+		write_sysreg_s(reg, SYS_PMSDSFR_EL1);
+	}
+
 	reg = arm_spe_event_to_pmslatfr(event);
 	write_sysreg_s(reg, SYS_PMSLATFR_EL1);
 
@@ -1116,6 +1144,9 @@ static void __arm_spe_pmu_dev_probe(void *info)
 	if (FIELD_GET(PMSIDR_EL1_EFT, reg))
 		spe_pmu->features |= SPE_PMU_FEAT_EFT;
 
+	if (FIELD_GET(PMSIDR_EL1_FDS, reg))
+		spe_pmu->features |= SPE_PMU_FEAT_FDS;
+
 	/* This field has a spaced out encoding, so just use a look-up */
 	fld = FIELD_GET(PMSIDR_EL1_INTERVAL, reg);
 	switch (fld) {

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 08/10] tools headers UAPI: Sync linux/perf_event.h with the kernel sources
  2025-06-05 10:48 [PATCH v3 00/10] perf: arm_spe: Armv8.8 SPE features James Clark
                   ` (6 preceding siblings ...)
  2025-06-05 10:49 ` [PATCH v3 07/10] perf: arm_spe: Add support for filtering on data source James Clark
@ 2025-06-05 10:49 ` James Clark
  2025-06-30 15:36   ` Ian Rogers
  2025-06-05 10:49 ` [PATCH v3 09/10] perf tools: Add support for perf_event_attr::config4 James Clark
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 37+ messages in thread
From: James Clark @ 2025-06-05 10:49 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, leo.yan
  Cc: linux-arm-kernel, linux-kernel, linux-perf-users, linux-doc,
	kvmarm, James Clark

To pickup config4 changes.

Tested-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
---
 tools/include/uapi/linux/perf_event.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index 78a362b80027..0d0ed85ad8cb 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -382,6 +382,7 @@ enum perf_event_read_format {
 #define PERF_ATTR_SIZE_VER6			120	/* Add: aux_sample_size */
 #define PERF_ATTR_SIZE_VER7			128	/* Add: sig_data */
 #define PERF_ATTR_SIZE_VER8			136	/* Add: config3 */
+#define PERF_ATTR_SIZE_VER9			144	/* add: config4 */
 
 /*
  * 'struct perf_event_attr' contains various attributes that define
@@ -543,6 +544,7 @@ struct perf_event_attr {
 	__u64	sig_data;
 
 	__u64	config3; /* extension of config2 */
+	__u64	config4; /* extension of config3 */
 };
 
 /*

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 09/10] perf tools: Add support for perf_event_attr::config4
  2025-06-05 10:48 [PATCH v3 00/10] perf: arm_spe: Armv8.8 SPE features James Clark
                   ` (7 preceding siblings ...)
  2025-06-05 10:49 ` [PATCH v3 08/10] tools headers UAPI: Sync linux/perf_event.h with the kernel sources James Clark
@ 2025-06-05 10:49 ` James Clark
  2025-06-05 10:49 ` [PATCH v3 10/10] perf docs: arm-spe: Document new SPE filtering features James Clark
  2025-06-30 13:26 ` [PATCH v3 00/10] perf: arm_spe: Armv8.8 SPE features James Clark
  10 siblings, 0 replies; 37+ messages in thread
From: James Clark @ 2025-06-05 10:49 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, leo.yan
  Cc: linux-arm-kernel, linux-kernel, linux-perf-users, linux-doc,
	kvmarm, James Clark

perf_event_attr has gained a new field, config4, so add support for it
extending the existing configN support.

Reviewed-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Tested-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
---
 tools/perf/tests/parse-events.c | 14 +++++++++++++-
 tools/perf/util/parse-events.c  | 11 +++++++++++
 tools/perf/util/parse-events.h  |  1 +
 tools/perf/util/parse-events.l  |  1 +
 tools/perf/util/pmu.c           |  8 ++++++++
 tools/perf/util/pmu.h           |  1 +
 6 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
index 5ec2e5607987..5f624a63d550 100644
--- a/tools/perf/tests/parse-events.c
+++ b/tools/perf/tests/parse-events.c
@@ -615,6 +615,8 @@ static int test__checkevent_pmu(struct evlist *evlist)
 	TEST_ASSERT_VAL("wrong config1",    1 == evsel->core.attr.config1);
 	TEST_ASSERT_VAL("wrong config2",    3 == evsel->core.attr.config2);
 	TEST_ASSERT_VAL("wrong config3",    0 == evsel->core.attr.config3);
+	TEST_ASSERT_VAL("wrong config4",    0 == evsel->core.attr.config4);
+
 	/*
 	 * The period value gets configured within evlist__config,
 	 * while this test executes only parse events method.
@@ -637,6 +639,7 @@ static int test__checkevent_list(struct evlist *evlist)
 		TEST_ASSERT_VAL("wrong config1", 0 == evsel->core.attr.config1);
 		TEST_ASSERT_VAL("wrong config2", 0 == evsel->core.attr.config2);
 		TEST_ASSERT_VAL("wrong config3", 0 == evsel->core.attr.config3);
+		TEST_ASSERT_VAL("wrong config4", 0 == evsel->core.attr.config4);
 		TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
 		TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
 		TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
@@ -813,6 +816,15 @@ static int test__checkterms_simple(struct parse_events_terms *terms)
 	TEST_ASSERT_VAL("wrong val", term->val.num == 4);
 	TEST_ASSERT_VAL("wrong config", !strcmp(term->config, "config3"));
 
+	/* config4=5 */
+	term = list_entry(term->list.next, struct parse_events_term, list);
+	TEST_ASSERT_VAL("wrong type term",
+			term->type_term == PARSE_EVENTS__TERM_TYPE_CONFIG4);
+	TEST_ASSERT_VAL("wrong type val",
+			term->type_val == PARSE_EVENTS__TERM_TYPE_NUM);
+	TEST_ASSERT_VAL("wrong val", term->val.num == 5);
+	TEST_ASSERT_VAL("wrong config", !strcmp(term->config, "config4"));
+
 	/* umask=1*/
 	term = list_entry(term->list.next, struct parse_events_term, list);
 	TEST_ASSERT_VAL("wrong type term",
@@ -2451,7 +2463,7 @@ struct terms_test {
 
 static const struct terms_test test__terms[] = {
 	[0] = {
-		.str   = "config=10,config1,config2=3,config3=4,umask=1,read,r0xead",
+		.str   = "config=10,config1,config2=3,config3=4,config4=5,umask=1,read,r0xead",
 		.check = test__checkterms_simple,
 	},
 };
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 2380de56a207..2fccdf5879c8 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -291,6 +291,8 @@ __add_event(struct list_head *list, int *idx,
 						PERF_PMU_FORMAT_VALUE_CONFIG2, "config2");
 			perf_pmu__warn_invalid_config(pmu, attr->config3, name,
 						PERF_PMU_FORMAT_VALUE_CONFIG3, "config3");
+			perf_pmu__warn_invalid_config(pmu, attr->config4, name,
+						PERF_PMU_FORMAT_VALUE_CONFIG4, "config4");
 		}
 	} else {
 		is_pmu_core = (attr->type == PERF_TYPE_HARDWARE ||
@@ -848,6 +850,7 @@ const char *parse_events__term_type_str(enum parse_events__term_type term_type)
 		[PARSE_EVENTS__TERM_TYPE_CONFIG1]		= "config1",
 		[PARSE_EVENTS__TERM_TYPE_CONFIG2]		= "config2",
 		[PARSE_EVENTS__TERM_TYPE_CONFIG3]		= "config3",
+		[PARSE_EVENTS__TERM_TYPE_CONFIG4]		= "config4",
 		[PARSE_EVENTS__TERM_TYPE_NAME]			= "name",
 		[PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD]		= "period",
 		[PARSE_EVENTS__TERM_TYPE_SAMPLE_FREQ]		= "freq",
@@ -896,6 +899,7 @@ config_term_avail(enum parse_events__term_type term_type, struct parse_events_er
 	case PARSE_EVENTS__TERM_TYPE_CONFIG1:
 	case PARSE_EVENTS__TERM_TYPE_CONFIG2:
 	case PARSE_EVENTS__TERM_TYPE_CONFIG3:
+	case PARSE_EVENTS__TERM_TYPE_CONFIG4:
 	case PARSE_EVENTS__TERM_TYPE_NAME:
 	case PARSE_EVENTS__TERM_TYPE_METRIC_ID:
 	case PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD:
@@ -965,6 +969,10 @@ do {									   \
 		CHECK_TYPE_VAL(NUM);
 		attr->config3 = term->val.num;
 		break;
+	case PARSE_EVENTS__TERM_TYPE_CONFIG4:
+		CHECK_TYPE_VAL(NUM);
+		attr->config4 = term->val.num;
+		break;
 	case PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD:
 		CHECK_TYPE_VAL(NUM);
 		break;
@@ -1173,6 +1181,7 @@ static int config_term_tracepoint(struct perf_event_attr *attr,
 	case PARSE_EVENTS__TERM_TYPE_CONFIG1:
 	case PARSE_EVENTS__TERM_TYPE_CONFIG2:
 	case PARSE_EVENTS__TERM_TYPE_CONFIG3:
+	case PARSE_EVENTS__TERM_TYPE_CONFIG4:
 	case PARSE_EVENTS__TERM_TYPE_NAME:
 	case PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD:
 	case PARSE_EVENTS__TERM_TYPE_SAMPLE_FREQ:
@@ -1314,6 +1323,7 @@ do {								\
 		case PARSE_EVENTS__TERM_TYPE_CONFIG1:
 		case PARSE_EVENTS__TERM_TYPE_CONFIG2:
 		case PARSE_EVENTS__TERM_TYPE_CONFIG3:
+		case PARSE_EVENTS__TERM_TYPE_CONFIG4:
 		case PARSE_EVENTS__TERM_TYPE_NAME:
 		case PARSE_EVENTS__TERM_TYPE_METRIC_ID:
 		case PARSE_EVENTS__TERM_TYPE_RAW:
@@ -1352,6 +1362,7 @@ static int get_config_chgs(struct perf_pmu *pmu, struct parse_events_terms *head
 		case PARSE_EVENTS__TERM_TYPE_CONFIG1:
 		case PARSE_EVENTS__TERM_TYPE_CONFIG2:
 		case PARSE_EVENTS__TERM_TYPE_CONFIG3:
+		case PARSE_EVENTS__TERM_TYPE_CONFIG4:
 		case PARSE_EVENTS__TERM_TYPE_NAME:
 		case PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD:
 		case PARSE_EVENTS__TERM_TYPE_SAMPLE_FREQ:
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index ab242f671031..81af219cafb3 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -58,6 +58,7 @@ enum parse_events__term_type {
 	PARSE_EVENTS__TERM_TYPE_CONFIG1,
 	PARSE_EVENTS__TERM_TYPE_CONFIG2,
 	PARSE_EVENTS__TERM_TYPE_CONFIG3,
+	PARSE_EVENTS__TERM_TYPE_CONFIG4,
 	PARSE_EVENTS__TERM_TYPE_NAME,
 	PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD,
 	PARSE_EVENTS__TERM_TYPE_SAMPLE_FREQ,
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 4af7b9c1f44d..3f83aaf33a34 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -317,6 +317,7 @@ config			{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_CONFIG); }
 config1			{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_CONFIG1); }
 config2			{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_CONFIG2); }
 config3			{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_CONFIG3); }
+config4			{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_CONFIG4); }
 name			{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_NAME); }
 period			{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD); }
 freq			{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_SAMPLE_FREQ); }
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 609828513f6c..c7782916f932 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -1508,6 +1508,10 @@ static int pmu_config_term(const struct perf_pmu *pmu,
 			assert(term->type_val == PARSE_EVENTS__TERM_TYPE_NUM);
 			pmu_format_value(bits, term->val.num, &attr->config3, zero);
 			break;
+		case PARSE_EVENTS__TERM_TYPE_CONFIG4:
+			assert(term->type_val == PARSE_EVENTS__TERM_TYPE_NUM);
+			pmu_format_value(bits, term->val.num, &attr->config4, zero);
+			break;
 		case PARSE_EVENTS__TERM_TYPE_USER: /* Not hardcoded. */
 			return -EINVAL;
 		case PARSE_EVENTS__TERM_TYPE_NAME ... PARSE_EVENTS__TERM_TYPE_CPU:
@@ -1555,6 +1559,9 @@ static int pmu_config_term(const struct perf_pmu *pmu,
 	case PERF_PMU_FORMAT_VALUE_CONFIG3:
 		vp = &attr->config3;
 		break;
+	case PERF_PMU_FORMAT_VALUE_CONFIG4:
+		vp = &attr->config4;
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -1875,6 +1882,7 @@ int perf_pmu__for_each_format(struct perf_pmu *pmu, void *state, pmu_format_call
 		"config1=0..0xffffffffffffffff",
 		"config2=0..0xffffffffffffffff",
 		"config3=0..0xffffffffffffffff",
+		"config4=0..0xffffffffffffffff",
 		"name=string",
 		"period=number",
 		"freq=number",
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 71b8636fd07d..49ac47e2b44f 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -23,6 +23,7 @@ enum {
 	PERF_PMU_FORMAT_VALUE_CONFIG1,
 	PERF_PMU_FORMAT_VALUE_CONFIG2,
 	PERF_PMU_FORMAT_VALUE_CONFIG3,
+	PERF_PMU_FORMAT_VALUE_CONFIG4,
 	PERF_PMU_FORMAT_VALUE_CONFIG_END,
 };
 

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 10/10] perf docs: arm-spe: Document new SPE filtering features
  2025-06-05 10:48 [PATCH v3 00/10] perf: arm_spe: Armv8.8 SPE features James Clark
                   ` (8 preceding siblings ...)
  2025-06-05 10:49 ` [PATCH v3 09/10] perf tools: Add support for perf_event_attr::config4 James Clark
@ 2025-06-05 10:49 ` James Clark
  2025-06-30 15:38   ` Ian Rogers
  2025-06-30 13:26 ` [PATCH v3 00/10] perf: arm_spe: Armv8.8 SPE features James Clark
  10 siblings, 1 reply; 37+ messages in thread
From: James Clark @ 2025-06-05 10:49 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, leo.yan
  Cc: linux-arm-kernel, linux-kernel, linux-perf-users, linux-doc,
	kvmarm, James Clark

FEAT_SPE_EFT and FEAT_SPE_FDS etc have new user facing format attributes
so document them. Also document existing 'event_filter' bits that were
missing from the doc and the fact that latency values are stored in the
weight field.

Reviewed-by: Leo Yan <leo.yan@arm.com>
Tested-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
---
 tools/perf/Documentation/perf-arm-spe.txt | 97 ++++++++++++++++++++++++++++---
 1 file changed, 88 insertions(+), 9 deletions(-)

diff --git a/tools/perf/Documentation/perf-arm-spe.txt b/tools/perf/Documentation/perf-arm-spe.txt
index 37afade4f1b2..4092b53b58d2 100644
--- a/tools/perf/Documentation/perf-arm-spe.txt
+++ b/tools/perf/Documentation/perf-arm-spe.txt
@@ -141,27 +141,65 @@ Config parameters
 These are placed between the // in the event and comma separated. For example '-e
 arm_spe/load_filter=1,min_latency=10/'
 
-  branch_filter=1     - collect branches only (PMSFCR.B)
-  event_filter=<mask> - filter on specific events (PMSEVFR) - see bitfield description below
+  event_filter=<mask> - logical AND filter on specific events (PMSEVFR) - see bitfield description below
+  inv_event_filter=<mask> - logical OR to filter out specific events (PMSNEVFR, FEAT_SPEv1p2) - see bitfield description below
   jitter=1            - use jitter to avoid resonance when sampling (PMSIRR.RND)
-  load_filter=1       - collect loads only (PMSFCR.LD)
   min_latency=<n>     - collect only samples with this latency or higher* (PMSLATFR)
   pa_enable=1         - collect physical address (as well as VA) of loads/stores (PMSCR.PA) - requires privilege
   pct_enable=1        - collect physical timestamp instead of virtual timestamp (PMSCR.PCT) - requires privilege
-  store_filter=1      - collect stores only (PMSFCR.ST)
   ts_enable=1         - enable timestamping with value of generic timer (PMSCR.TS)
   discard=1           - enable SPE PMU events but don't collect sample data - see 'Discard mode' (PMBLIMITR.FM = DISCARD)
+  data_src_filter=<mask> - mask to filter from 0-63 possible data sources (PMSDSFR, FEAT_SPE_FDS) - See 'Data source filtering'
 
 +++*+++ Latency is the total latency from the point at which sampling started on that instruction, rather
 than only the execution latency.
 
-Only some events can be filtered on; these include:
-
-  bit 1     - instruction retired (i.e. omit speculative instructions)
+Only some events can be filtered on using 'event_filter' bits. The overall
+filter is the logical AND of these bits, for example if bits 3 and 5 are set
+only samples that have both 'L1D cache refill' AND 'TLB walk' are recorded. When
+FEAT_SPEv1p2 is implemented 'inv_event_filter' can also be used to exclude
+events that have any (OR) of the filter's bits set. For example setting bits 3
+and 5 in 'inv_event_filter' will exclude any events that are either L1D cache
+refill OR TLB walk. If the same bit is set in both filters it's UNPREDICTABLE
+whether the sample is included or excluded. Filter bits for both event_filter
+and inv_event_filter are:
+
+  bit 1     - Instruction retired (i.e. omit speculative instructions)
+  bit 2     - L1D access (FEAT_SPEv1p4)
   bit 3     - L1D refill
+  bit 4     - TLB access (FEAT_SPEv1p4)
   bit 5     - TLB refill
-  bit 7     - mispredict
-  bit 11    - misaligned access
+  bit 6     - Not taken event (FEAT_SPEv1p2)
+  bit 7     - Mispredict
+  bit 8     - Last level cache access (FEAT_SPEv1p4)
+  bit 9     - Last level cache miss (FEAT_SPEv1p4)
+  bit 10    - Remote access (FEAT_SPEv1p4)
+  bit 11    - Misaligned access (FEAT_SPEv1p1)
+  bit 12-15 - IMPLEMENTATION DEFINED events (when implemented)
+  bit 16    - Transaction (FEAT_TME)
+  bit 17    - Partial or empty SME or SVE predicate (FEAT_SPEv1p1)
+  bit 18    - Empty SME or SVE predicate (FEAT_SPEv1p1)
+  bit 19    - L2D access (FEAT_SPEv1p4)
+  bit 20    - L2D miss (FEAT_SPEv1p4)
+  bit 21    - Cache data modified (FEAT_SPEv1p4)
+  bit 22    - Recently fetched (FEAT_SPEv1p4)
+  bit 23    - Data snooped (FEAT_SPEv1p4)
+  bit 24    - Streaming SVE mode event (when FEAT_SPE_SME is implemented), or
+              IMPLEMENTATION DEFINED event 24 (when implemented, only versions
+              less than FEAT_SPEv1p4)
+  bit 25    - SMCU or external coprocessor operation event when FEAT_SPE_SME is
+              implemented, or IMPLEMENTATION DEFINED event 25 (when implemented,
+              only versions less than FEAT_SPEv1p4)
+  bit 26-31 - IMPLEMENTATION DEFINED events (only versions less than FEAT_SPEv1p4)
+  bit 48-63 - IMPLEMENTATION DEFINED events (when implemented)
+
+For IMPLEMENTATION DEFINED bits, refer to the CPU TRM if these bits are
+implemented.
+
+The driver will reject events if requested filter bits require unimplemented SPE
+versions, but will not reject filter bits for unimplemented IMPDEF bits or when
+their related feature is not present (e.g. SME). For example, if FEAT_SPEv1p2 is
+not implemented, filtering on "Not taken event" (bit 6) will be rejected.
 
 So to sample just retired instructions:
 
@@ -171,6 +209,31 @@ or just mispredicted branches:
 
   perf record -e arm_spe/event_filter=0x80/ -- ./mybench
 
+When set, the following filters can be used to select samples that match any of
+the operation types (OR filtering). If only one is set then only samples of that
+type are collected:
+
+  branch_filter=1     - Collect branches (PMSFCR.B)
+  load_filter=1       - Collect loads (PMSFCR.LD)
+  store_filter=1      - Collect stores (PMSFCR.ST)
+
+When extended filtering is supported (FEAT_SPE_EFT), SIMD and float
+pointer operations can also be selected:
+
+  simd_filter=1         - Collect SIMD loads, stores and operations (PMSFCR.SIMD)
+  float_filter=1        - Collect floating point loads, stores and operations (PMSFCR.FP)
+
+When extended filtering is supported (FEAT_SPE_EFT), operation type filters can
+be changed to AND using _mask fields. For example samples could be selected if
+they are store AND SIMD by setting 'store_filter=1,simd_filter=1,
+store_filter_mask=1,simd_filter_mask=1'. The new masks are as follows:
+
+  branch_filter_mask=1  - Change branch filter behavior from OR to AND (PMSFCR.Bm)
+  load_filter_mask=1    - Change load filter behavior from OR to AND (PMSFCR.LDm)
+  store_filter_mask=1   - Change store filter behavior from OR to AND (PMSFCR.STm)
+  simd_filter_mask=1    - Change SIMD filter behavior from OR to AND (PMSFCR.SIMDm)
+  float_filter_mask=1   - Change floating point filter behavior from OR to AND (PMSFCR.FPm)
+
 Viewing the data
 ~~~~~~~~~~~~~~~~~
 
@@ -204,6 +267,10 @@ Memory access details are also stored on the samples and this can be viewed with
 
   perf report --mem-mode
 
+The latency value from the SPE sample is stored in the 'weight' field of the
+Perf samples and can be displayed in Perf script and report outputs by enabling
+its display from the command line.
+
 Common errors
 ~~~~~~~~~~~~~
 
@@ -247,6 +314,18 @@ to minimize output. Then run perf stat:
   perf record -e arm_spe/discard/ -a -N -B --no-bpf-event -o - > /dev/null &
   perf stat -e SAMPLE_FEED_LD
 
+Data source filtering
+~~~~~~~~~~~~~~~~~~~~~
+
+When FEAT_SPE_FDS is present, 'data_src_filter' can be used as a mask to filter
+on a subset (0 - 63) of possible data source IDs. The full range of data sources
+is 0 - 65535 although these are unlikely to be used in practice. Data sources
+are IMPDEF so refer to the TRM for the mappings. Each bit N of the filter maps
+to data source N. The filter is an OR of all the bits, so for example setting
+bits 0 and 3 includes only packets from data sources 0 OR 3. When
+'data_src_filter' is set to 0 data source filtering is disabled and all data
+sources are included.
+
 SEE ALSO
 --------
 

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 01/10] arm64: sysreg: Add new PMSFCR_EL1 fields and PMSDSFR_EL1 register
  2025-06-05 10:48 ` [PATCH v3 01/10] arm64: sysreg: Add new PMSFCR_EL1 fields and PMSDSFR_EL1 register James Clark
@ 2025-06-12  6:44   ` Anshuman Khandual
  2025-07-14 13:32   ` Will Deacon
  1 sibling, 0 replies; 37+ messages in thread
From: Anshuman Khandual @ 2025-06-12  6:44 UTC (permalink / raw)
  To: James Clark, Catalin Marinas, Will Deacon, Mark Rutland,
	Jonathan Corbet, Marc Zyngier, Oliver Upton, Joey Gouly,
	Suzuki K Poulose, Zenghui Yu, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Alexander Shishkin,
	Jiri Olsa, Ian Rogers, Adrian Hunter, leo.yan
  Cc: linux-arm-kernel, linux-kernel, linux-perf-users, linux-doc,
	kvmarm

On 05/06/25 4:18 PM, James Clark wrote:
> Add new fields and register that are introduced for the features
> FEAT_SPE_EFT (extended filtering) and FEAT_SPE_FDS (data source
> filtering).
> 
> Tested-by: Leo Yan <leo.yan@arm.com>
> Signed-off-by: James Clark <james.clark@linaro.org>
> ---
>  arch/arm64/tools/sysreg | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg
> index 8a8cf6874298..2e897d8a4040 100644
> --- a/arch/arm64/tools/sysreg
> +++ b/arch/arm64/tools/sysreg
> @@ -2859,11 +2859,20 @@ Field	0	RND
>  EndSysreg
>  
>  Sysreg	PMSFCR_EL1	3	0	9	9	4
> -Res0	63:19
> +Res0	63:53
> +Field	52	SIMDm
> +Field	51	FPm
> +Field	50	STm
> +Field	49	LDm
> +Field	48	Bm
> +Res0	47:21
> +Field	20	SIMD
> +Field	19	FP
>  Field	18	ST
>  Field	17	LD
>  Field	16	B
> -Res0	15:4
> +Res0	15:5
> +Field	4	FDS
>  Field	3	FnE
>  Field	2	FL
>  Field	1	FT
> 

These register fields checks out against ARM ARM DDI 0487 (L.b)

Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 02/10] perf: arm_spe: Support FEAT_SPEv1p4 filters
  2025-06-05 10:49 ` [PATCH v3 02/10] perf: arm_spe: Support FEAT_SPEv1p4 filters James Clark
@ 2025-06-12  7:35   ` Anshuman Khandual
  2025-06-12  8:42     ` James Clark
  2025-07-14 13:26   ` Will Deacon
  1 sibling, 1 reply; 37+ messages in thread
From: Anshuman Khandual @ 2025-06-12  7:35 UTC (permalink / raw)
  To: James Clark, Catalin Marinas, Will Deacon, Mark Rutland,
	Jonathan Corbet, Marc Zyngier, Oliver Upton, Joey Gouly,
	Suzuki K Poulose, Zenghui Yu, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Alexander Shishkin,
	Jiri Olsa, Ian Rogers, Adrian Hunter, leo.yan
  Cc: linux-arm-kernel, linux-kernel, linux-perf-users, linux-doc,
	kvmarm

On 05/06/25 4:19 PM, James Clark wrote:
> FEAT_SPEv1p4 (optional from Armv8.8) adds some new filter bits, so
> remove them from the previous version's RES0 bits using
> PMSEVFR_EL1_RES0_V1P4_EXCL. It also makes some previously available bits
> unavailable again, so add those back using PMSEVFR_EL1_RES0_V1P4_INCL.

Just wondering - why cannot all the new applicable filter bits be added
explicitly for PMSEVFR_EL1_RES0_V1P4 without using exclude and include
intermediaries.

> E.g:
> 
>   E[30], bit [30]
>   When FEAT_SPEv1p4 is _not_ implemented ...
> 
> FEAT_SPE_V1P3 has the same filters as V1P2 so explicitly add it to the
> switch.
A small nit - should FEAT_SPE_V1P3 addition be done in a previous patch
as it is an already existing thing ?

> 
> Reviewed-by: Leo Yan <leo.yan@arm.com>
> Tested-by: Leo Yan <leo.yan@arm.com>
> Signed-off-by: James Clark <james.clark@linaro.org>
> ---
>  arch/arm64/include/asm/sysreg.h | 7 +++++++
>  drivers/perf/arm_spe_pmu.c      | 5 ++++-
>  2 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index f1bb0d10c39a..880090df3efc 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -358,6 +358,13 @@
>  	(PMSEVFR_EL1_RES0_IMP & ~(BIT_ULL(18) | BIT_ULL(17) | BIT_ULL(11)))
>  #define PMSEVFR_EL1_RES0_V1P2	\
>  	(PMSEVFR_EL1_RES0_V1P1 & ~BIT_ULL(6))
> +#define PMSEVFR_EL1_RES0_V1P4_EXCL \
> +	(BIT_ULL(2) | BIT_ULL(4) | GENMASK_ULL(10, 8) | GENMASK_ULL(23, 19))
> +#define PMSEVFR_EL1_RES0_V1P4_INCL \
> +	(GENMASK_ULL(31, 26))> +#define PMSEVFR_EL1_RES0_V1P4	\
> +	(PMSEVFR_EL1_RES0_V1P4_INCL | \
> +	(PMSEVFR_EL1_RES0_V1P2 & ~PMSEVFR_EL1_RES0_V1P4_EXCL))
>  
>  /* Buffer error reporting */
>  #define PMBSR_EL1_FAULT_FSC_SHIFT	PMBSR_EL1_MSS_SHIFT
> diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
> index 3efed8839a4e..d9f6d229dce8 100644
> --- a/drivers/perf/arm_spe_pmu.c
> +++ b/drivers/perf/arm_spe_pmu.c
> @@ -701,9 +701,12 @@ static u64 arm_spe_pmsevfr_res0(u16 pmsver)
>  	case ID_AA64DFR0_EL1_PMSVer_V1P1:
>  		return PMSEVFR_EL1_RES0_V1P1;
>  	case ID_AA64DFR0_EL1_PMSVer_V1P2:
> +	case ID_AA64DFR0_EL1_PMSVer_V1P3:
> +		return PMSEVFR_EL1_RES0_V1P2;
> +	case ID_AA64DFR0_EL1_PMSVer_V1P4:
>  	/* Return the highest version we support in default */
>  	default:
> -		return PMSEVFR_EL1_RES0_V1P2;
> +		return PMSEVFR_EL1_RES0_V1P4;
>  	}
>  }
>  
> 





^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 02/10] perf: arm_spe: Support FEAT_SPEv1p4 filters
  2025-06-12  7:35   ` Anshuman Khandual
@ 2025-06-12  8:42     ` James Clark
  0 siblings, 0 replies; 37+ messages in thread
From: James Clark @ 2025-06-12  8:42 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linux-arm-kernel, linux-kernel, linux-perf-users, linux-doc,
	kvmarm, Catalin Marinas, Will Deacon, Mark Rutland,
	Jonathan Corbet, Marc Zyngier, Oliver Upton, Joey Gouly,
	Suzuki K Poulose, Zenghui Yu, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Alexander Shishkin,
	Jiri Olsa, Ian Rogers, Adrian Hunter, leo.yan



On 12/06/2025 8:35 am, Anshuman Khandual wrote:
> On 05/06/25 4:19 PM, James Clark wrote:
>> FEAT_SPEv1p4 (optional from Armv8.8) adds some new filter bits, so
>> remove them from the previous version's RES0 bits using
>> PMSEVFR_EL1_RES0_V1P4_EXCL. It also makes some previously available bits
>> unavailable again, so add those back using PMSEVFR_EL1_RES0_V1P4_INCL.
> 
> Just wondering - why cannot all the new applicable filter bits be added
> explicitly for PMSEVFR_EL1_RES0_V1P4 without using exclude and include
> intermediaries.
> 

They could but there would be a lot of duplication. Each version tended 
to add only a few bits to the previous version. Also for consistency, 
they were already defined in this way. I didn't think there was much to 
gain by redefining the whole bitset just for this one, it's probably 
going to look just as messy.

>> E.g:
>>
>>    E[30], bit [30]
>>    When FEAT_SPEv1p4 is _not_ implemented ...
>>
>> FEAT_SPE_V1P3 has the same filters as V1P2 so explicitly add it to the
>> switch.
> A small nit - should FEAT_SPE_V1P3 addition be done in a previous patch
> as it is an already existing thing ?
> 

It's related to this patch because I change the default case. Before 
V1P3 hits 'default:' and returns PMSEVFR_EL1_RES0_V1P2. But now the 
highest supported is PMSEVFR_EL1_RES0_V1P4 for the default case so I 
need to add a case for V1P3 to keep it returning PMSEVFR_EL1_RES0_V1P2 
filters. There's no bug.

>>
>> Reviewed-by: Leo Yan <leo.yan@arm.com>
>> Tested-by: Leo Yan <leo.yan@arm.com>
>> Signed-off-by: James Clark <james.clark@linaro.org>
>> ---
>>   arch/arm64/include/asm/sysreg.h | 7 +++++++
>>   drivers/perf/arm_spe_pmu.c      | 5 ++++-
>>   2 files changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
>> index f1bb0d10c39a..880090df3efc 100644
>> --- a/arch/arm64/include/asm/sysreg.h
>> +++ b/arch/arm64/include/asm/sysreg.h
>> @@ -358,6 +358,13 @@
>>   	(PMSEVFR_EL1_RES0_IMP & ~(BIT_ULL(18) | BIT_ULL(17) | BIT_ULL(11)))
>>   #define PMSEVFR_EL1_RES0_V1P2	\
>>   	(PMSEVFR_EL1_RES0_V1P1 & ~BIT_ULL(6))
>> +#define PMSEVFR_EL1_RES0_V1P4_EXCL \
>> +	(BIT_ULL(2) | BIT_ULL(4) | GENMASK_ULL(10, 8) | GENMASK_ULL(23, 19))
>> +#define PMSEVFR_EL1_RES0_V1P4_INCL \
>> +	(GENMASK_ULL(31, 26))> +#define PMSEVFR_EL1_RES0_V1P4	\
>> +	(PMSEVFR_EL1_RES0_V1P4_INCL | \
>> +	(PMSEVFR_EL1_RES0_V1P2 & ~PMSEVFR_EL1_RES0_V1P4_EXCL))
>>   
>>   /* Buffer error reporting */
>>   #define PMBSR_EL1_FAULT_FSC_SHIFT	PMBSR_EL1_MSS_SHIFT
>> diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
>> index 3efed8839a4e..d9f6d229dce8 100644
>> --- a/drivers/perf/arm_spe_pmu.c
>> +++ b/drivers/perf/arm_spe_pmu.c
>> @@ -701,9 +701,12 @@ static u64 arm_spe_pmsevfr_res0(u16 pmsver)
>>   	case ID_AA64DFR0_EL1_PMSVer_V1P1:
>>   		return PMSEVFR_EL1_RES0_V1P1;
>>   	case ID_AA64DFR0_EL1_PMSVer_V1P2:
>> +	case ID_AA64DFR0_EL1_PMSVer_V1P3:
>> +		return PMSEVFR_EL1_RES0_V1P2;
>> +	case ID_AA64DFR0_EL1_PMSVer_V1P4:
>>   	/* Return the highest version we support in default */
>>   	default:
>> -		return PMSEVFR_EL1_RES0_V1P2;
>> +		return PMSEVFR_EL1_RES0_V1P4;
>>   	}
>>   }
>>   
>>
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 00/10] perf: arm_spe: Armv8.8 SPE features
  2025-06-05 10:48 [PATCH v3 00/10] perf: arm_spe: Armv8.8 SPE features James Clark
                   ` (9 preceding siblings ...)
  2025-06-05 10:49 ` [PATCH v3 10/10] perf docs: arm-spe: Document new SPE filtering features James Clark
@ 2025-06-30 13:26 ` James Clark
  10 siblings, 0 replies; 37+ messages in thread
From: James Clark @ 2025-06-30 13:26 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, Marc Zyngier, Arnaldo Carvalho de Melo,
	Namhyung Kim
  Cc: linux-arm-kernel, linux-kernel, linux-perf-users, linux-doc,
	kvmarm, Oliver Upton, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Peter Zijlstra, Ingo Molnar, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, leo.yan, Jonathan Corbet,
	Catalin Marinas



On 05/06/2025 11:48 am, James Clark wrote:
> Support 3 new SPE features: FEAT_SPEv1p4 filters, FEAT_SPE_EFT extended
> filtering, and SPE_FEAT_FDS data source filtering. The features are
> independent can be applied separately:
> 
>    * Prerequisite sysreg changes - patches 1 - 2
>    * FEAT_SPEv1p4 - patch 3
>    * FEAT_SPE_EFT - patch 4
>    * FEAT_SPE_FDS - patches 5 - 8
>    * FEAT_SPE_FDS Perf tool changes - patches 9 - 11
> 
> The first two features will work with old Perfs but a Perf change to
> parse the new config4 is required for the last feature.
> 
> To:
> 
> ---
> Changes in v3:
> - Use PMSIDR_EL1_FDS instead of 1 << PMSIDR_EL1_FDS_SHIFT
> - Add VNCR offsets
> - Link to v2: https://lore.kernel.org/r/20250529-james-perf-feat_spe_eft-v2-0-a01a9baad06a@linaro.org
> 
> Changes in v2:
> - Fix detection of FEAT_SPE_FDS in el2_setup.h
> - Pickup Marc Z's sysreg change instead which matches the json
> - Restructure and expand docs changes
> - Link to v1: https://lore.kernel.org/r/20250506-james-perf-feat_spe_eft-v1-0-dd480e8e4851@linaro.org
> 
> ---
> James Clark (10):
>        arm64: sysreg: Add new PMSFCR_EL1 fields and PMSDSFR_EL1 register
>        perf: arm_spe: Support FEAT_SPEv1p4 filters
>        perf: arm_spe: Add support for FEAT_SPE_EFT extended filtering
>        arm64/boot: Enable EL2 requirements for SPE_FEAT_FDS
>        KVM: arm64: Add trap configs for PMSDSFR_EL1
>        perf: Add perf_event_attr::config4
>        perf: arm_spe: Add support for filtering on data source
>        tools headers UAPI: Sync linux/perf_event.h with the kernel sources
>        perf tools: Add support for perf_event_attr::config4
>        perf docs: arm-spe: Document new SPE filtering features
> 
>   Documentation/arch/arm64/booting.rst      |  11 ++++
>   arch/arm64/include/asm/el2_setup.h        |  14 +++++
>   arch/arm64/include/asm/sysreg.h           |   7 +++
>   arch/arm64/include/asm/vncr_mapping.h     |   2 +
>   arch/arm64/kvm/emulate-nested.c           |   1 +
>   arch/arm64/kvm/sys_regs.c                 |   1 +
>   arch/arm64/tools/sysreg                   |  13 +++-
>   drivers/perf/arm_spe_pmu.c                | 100 +++++++++++++++++++++++++++++-
>   include/uapi/linux/perf_event.h           |   2 +
>   tools/include/uapi/linux/perf_event.h     |   2 +
>   tools/perf/Documentation/perf-arm-spe.txt |  97 ++++++++++++++++++++++++++---
>   tools/perf/tests/parse-events.c           |  14 ++++-
>   tools/perf/util/parse-events.c            |  11 ++++
>   tools/perf/util/parse-events.h            |   1 +
>   tools/perf/util/parse-events.l            |   1 +
>   tools/perf/util/pmu.c                     |   8 +++
>   tools/perf/util/pmu.h                     |   1 +
>   17 files changed, 273 insertions(+), 13 deletions(-)
> ---
> base-commit: ec7714e4947909190ffb3041a03311a975350fe0
> change-id: 20250312-james-perf-feat_spe_eft-66cdf4d8fe99
> 
> Best regards,

Gentle ping, thanks


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 06/10] perf: Add perf_event_attr::config4
  2025-06-05 10:49 ` [PATCH v3 06/10] perf: Add perf_event_attr::config4 James Clark
@ 2025-06-30 15:35   ` Ian Rogers
  2025-07-14 13:56   ` Will Deacon
  1 sibling, 0 replies; 37+ messages in thread
From: Ian Rogers @ 2025-06-30 15:35 UTC (permalink / raw)
  To: James Clark
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	leo.yan, linux-arm-kernel, linux-kernel, linux-perf-users,
	linux-doc, kvmarm

On Thu, Jun 5, 2025 at 3:50 AM James Clark <james.clark@linaro.org> wrote:
>
> Arm FEAT_SPE_FDS adds the ability to filter on the data source of a
> packet using another 64-bits of event filtering control. As the existing
> perf_event_attr::configN fields are all used up for SPE PMU, an
> additional field is needed. Add a new 'config4' field.
>
> Reviewed-by: Leo Yan <leo.yan@arm.com>
> Tested-by: Leo Yan <leo.yan@arm.com>
> Signed-off-by: James Clark <james.clark@linaro.org>

Reviewed-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
>  include/uapi/linux/perf_event.h | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index 78a362b80027..0d0ed85ad8cb 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -382,6 +382,7 @@ enum perf_event_read_format {
>  #define PERF_ATTR_SIZE_VER6                    120     /* Add: aux_sample_size */
>  #define PERF_ATTR_SIZE_VER7                    128     /* Add: sig_data */
>  #define PERF_ATTR_SIZE_VER8                    136     /* Add: config3 */
> +#define PERF_ATTR_SIZE_VER9                    144     /* add: config4 */
>
>  /*
>   * 'struct perf_event_attr' contains various attributes that define
> @@ -543,6 +544,7 @@ struct perf_event_attr {
>         __u64   sig_data;
>
>         __u64   config3; /* extension of config2 */
> +       __u64   config4; /* extension of config3 */
>  };
>
>  /*
>
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 08/10] tools headers UAPI: Sync linux/perf_event.h with the kernel sources
  2025-06-05 10:49 ` [PATCH v3 08/10] tools headers UAPI: Sync linux/perf_event.h with the kernel sources James Clark
@ 2025-06-30 15:36   ` Ian Rogers
  0 siblings, 0 replies; 37+ messages in thread
From: Ian Rogers @ 2025-06-30 15:36 UTC (permalink / raw)
  To: James Clark
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	leo.yan, linux-arm-kernel, linux-kernel, linux-perf-users,
	linux-doc, kvmarm

On Thu, Jun 5, 2025 at 3:50 AM James Clark <james.clark@linaro.org> wrote:
>
> To pickup config4 changes.
>
> Tested-by: Leo Yan <leo.yan@arm.com>
> Signed-off-by: James Clark <james.clark@linaro.org>

Reviewed-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
>  tools/include/uapi/linux/perf_event.h | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
> index 78a362b80027..0d0ed85ad8cb 100644
> --- a/tools/include/uapi/linux/perf_event.h
> +++ b/tools/include/uapi/linux/perf_event.h
> @@ -382,6 +382,7 @@ enum perf_event_read_format {
>  #define PERF_ATTR_SIZE_VER6                    120     /* Add: aux_sample_size */
>  #define PERF_ATTR_SIZE_VER7                    128     /* Add: sig_data */
>  #define PERF_ATTR_SIZE_VER8                    136     /* Add: config3 */
> +#define PERF_ATTR_SIZE_VER9                    144     /* add: config4 */
>
>  /*
>   * 'struct perf_event_attr' contains various attributes that define
> @@ -543,6 +544,7 @@ struct perf_event_attr {
>         __u64   sig_data;
>
>         __u64   config3; /* extension of config2 */
> +       __u64   config4; /* extension of config3 */
>  };
>
>  /*
>
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 10/10] perf docs: arm-spe: Document new SPE filtering features
  2025-06-05 10:49 ` [PATCH v3 10/10] perf docs: arm-spe: Document new SPE filtering features James Clark
@ 2025-06-30 15:38   ` Ian Rogers
  0 siblings, 0 replies; 37+ messages in thread
From: Ian Rogers @ 2025-06-30 15:38 UTC (permalink / raw)
  To: James Clark
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	leo.yan, linux-arm-kernel, linux-kernel, linux-perf-users,
	linux-doc, kvmarm

On Thu, Jun 5, 2025 at 3:50 AM James Clark <james.clark@linaro.org> wrote:
>
> FEAT_SPE_EFT and FEAT_SPE_FDS etc have new user facing format attributes
> so document them. Also document existing 'event_filter' bits that were
> missing from the doc and the fact that latency values are stored in the
> weight field.
>
> Reviewed-by: Leo Yan <leo.yan@arm.com>
> Tested-by: Leo Yan <leo.yan@arm.com>
> Signed-off-by: James Clark <james.clark@linaro.org>

Reviewed-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
>  tools/perf/Documentation/perf-arm-spe.txt | 97 ++++++++++++++++++++++++++++---
>  1 file changed, 88 insertions(+), 9 deletions(-)
>
> diff --git a/tools/perf/Documentation/perf-arm-spe.txt b/tools/perf/Documentation/perf-arm-spe.txt
> index 37afade4f1b2..4092b53b58d2 100644
> --- a/tools/perf/Documentation/perf-arm-spe.txt
> +++ b/tools/perf/Documentation/perf-arm-spe.txt
> @@ -141,27 +141,65 @@ Config parameters
>  These are placed between the // in the event and comma separated. For example '-e
>  arm_spe/load_filter=1,min_latency=10/'
>
> -  branch_filter=1     - collect branches only (PMSFCR.B)
> -  event_filter=<mask> - filter on specific events (PMSEVFR) - see bitfield description below
> +  event_filter=<mask> - logical AND filter on specific events (PMSEVFR) - see bitfield description below
> +  inv_event_filter=<mask> - logical OR to filter out specific events (PMSNEVFR, FEAT_SPEv1p2) - see bitfield description below
>    jitter=1            - use jitter to avoid resonance when sampling (PMSIRR.RND)
> -  load_filter=1       - collect loads only (PMSFCR.LD)
>    min_latency=<n>     - collect only samples with this latency or higher* (PMSLATFR)
>    pa_enable=1         - collect physical address (as well as VA) of loads/stores (PMSCR.PA) - requires privilege
>    pct_enable=1        - collect physical timestamp instead of virtual timestamp (PMSCR.PCT) - requires privilege
> -  store_filter=1      - collect stores only (PMSFCR.ST)
>    ts_enable=1         - enable timestamping with value of generic timer (PMSCR.TS)
>    discard=1           - enable SPE PMU events but don't collect sample data - see 'Discard mode' (PMBLIMITR.FM = DISCARD)
> +  data_src_filter=<mask> - mask to filter from 0-63 possible data sources (PMSDSFR, FEAT_SPE_FDS) - See 'Data source filtering'
>
>  +++*+++ Latency is the total latency from the point at which sampling started on that instruction, rather
>  than only the execution latency.
>
> -Only some events can be filtered on; these include:
> -
> -  bit 1     - instruction retired (i.e. omit speculative instructions)
> +Only some events can be filtered on using 'event_filter' bits. The overall
> +filter is the logical AND of these bits, for example if bits 3 and 5 are set
> +only samples that have both 'L1D cache refill' AND 'TLB walk' are recorded. When
> +FEAT_SPEv1p2 is implemented 'inv_event_filter' can also be used to exclude
> +events that have any (OR) of the filter's bits set. For example setting bits 3
> +and 5 in 'inv_event_filter' will exclude any events that are either L1D cache
> +refill OR TLB walk. If the same bit is set in both filters it's UNPREDICTABLE
> +whether the sample is included or excluded. Filter bits for both event_filter
> +and inv_event_filter are:
> +
> +  bit 1     - Instruction retired (i.e. omit speculative instructions)
> +  bit 2     - L1D access (FEAT_SPEv1p4)
>    bit 3     - L1D refill
> +  bit 4     - TLB access (FEAT_SPEv1p4)
>    bit 5     - TLB refill
> -  bit 7     - mispredict
> -  bit 11    - misaligned access
> +  bit 6     - Not taken event (FEAT_SPEv1p2)
> +  bit 7     - Mispredict
> +  bit 8     - Last level cache access (FEAT_SPEv1p4)
> +  bit 9     - Last level cache miss (FEAT_SPEv1p4)
> +  bit 10    - Remote access (FEAT_SPEv1p4)
> +  bit 11    - Misaligned access (FEAT_SPEv1p1)
> +  bit 12-15 - IMPLEMENTATION DEFINED events (when implemented)
> +  bit 16    - Transaction (FEAT_TME)
> +  bit 17    - Partial or empty SME or SVE predicate (FEAT_SPEv1p1)
> +  bit 18    - Empty SME or SVE predicate (FEAT_SPEv1p1)
> +  bit 19    - L2D access (FEAT_SPEv1p4)
> +  bit 20    - L2D miss (FEAT_SPEv1p4)
> +  bit 21    - Cache data modified (FEAT_SPEv1p4)
> +  bit 22    - Recently fetched (FEAT_SPEv1p4)
> +  bit 23    - Data snooped (FEAT_SPEv1p4)
> +  bit 24    - Streaming SVE mode event (when FEAT_SPE_SME is implemented), or
> +              IMPLEMENTATION DEFINED event 24 (when implemented, only versions
> +              less than FEAT_SPEv1p4)
> +  bit 25    - SMCU or external coprocessor operation event when FEAT_SPE_SME is
> +              implemented, or IMPLEMENTATION DEFINED event 25 (when implemented,
> +              only versions less than FEAT_SPEv1p4)
> +  bit 26-31 - IMPLEMENTATION DEFINED events (only versions less than FEAT_SPEv1p4)
> +  bit 48-63 - IMPLEMENTATION DEFINED events (when implemented)
> +
> +For IMPLEMENTATION DEFINED bits, refer to the CPU TRM if these bits are
> +implemented.
> +
> +The driver will reject events if requested filter bits require unimplemented SPE
> +versions, but will not reject filter bits for unimplemented IMPDEF bits or when
> +their related feature is not present (e.g. SME). For example, if FEAT_SPEv1p2 is
> +not implemented, filtering on "Not taken event" (bit 6) will be rejected.
>
>  So to sample just retired instructions:
>
> @@ -171,6 +209,31 @@ or just mispredicted branches:
>
>    perf record -e arm_spe/event_filter=0x80/ -- ./mybench
>
> +When set, the following filters can be used to select samples that match any of
> +the operation types (OR filtering). If only one is set then only samples of that
> +type are collected:
> +
> +  branch_filter=1     - Collect branches (PMSFCR.B)
> +  load_filter=1       - Collect loads (PMSFCR.LD)
> +  store_filter=1      - Collect stores (PMSFCR.ST)
> +
> +When extended filtering is supported (FEAT_SPE_EFT), SIMD and float
> +pointer operations can also be selected:
> +
> +  simd_filter=1         - Collect SIMD loads, stores and operations (PMSFCR.SIMD)
> +  float_filter=1        - Collect floating point loads, stores and operations (PMSFCR.FP)
> +
> +When extended filtering is supported (FEAT_SPE_EFT), operation type filters can
> +be changed to AND using _mask fields. For example samples could be selected if
> +they are store AND SIMD by setting 'store_filter=1,simd_filter=1,
> +store_filter_mask=1,simd_filter_mask=1'. The new masks are as follows:
> +
> +  branch_filter_mask=1  - Change branch filter behavior from OR to AND (PMSFCR.Bm)
> +  load_filter_mask=1    - Change load filter behavior from OR to AND (PMSFCR.LDm)
> +  store_filter_mask=1   - Change store filter behavior from OR to AND (PMSFCR.STm)
> +  simd_filter_mask=1    - Change SIMD filter behavior from OR to AND (PMSFCR.SIMDm)
> +  float_filter_mask=1   - Change floating point filter behavior from OR to AND (PMSFCR.FPm)
> +
>  Viewing the data
>  ~~~~~~~~~~~~~~~~~
>
> @@ -204,6 +267,10 @@ Memory access details are also stored on the samples and this can be viewed with
>
>    perf report --mem-mode
>
> +The latency value from the SPE sample is stored in the 'weight' field of the
> +Perf samples and can be displayed in Perf script and report outputs by enabling
> +its display from the command line.
> +
>  Common errors
>  ~~~~~~~~~~~~~
>
> @@ -247,6 +314,18 @@ to minimize output. Then run perf stat:
>    perf record -e arm_spe/discard/ -a -N -B --no-bpf-event -o - > /dev/null &
>    perf stat -e SAMPLE_FEED_LD
>
> +Data source filtering
> +~~~~~~~~~~~~~~~~~~~~~
> +
> +When FEAT_SPE_FDS is present, 'data_src_filter' can be used as a mask to filter
> +on a subset (0 - 63) of possible data source IDs. The full range of data sources
> +is 0 - 65535 although these are unlikely to be used in practice. Data sources
> +are IMPDEF so refer to the TRM for the mappings. Each bit N of the filter maps
> +to data source N. The filter is an OR of all the bits, so for example setting
> +bits 0 and 3 includes only packets from data sources 0 OR 3. When
> +'data_src_filter' is set to 0 data source filtering is disabled and all data
> +sources are included.
> +
>  SEE ALSO
>  --------
>
>
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 05/10] KVM: arm64: Add trap configs for PMSDSFR_EL1
  2025-06-05 10:49 ` [PATCH v3 05/10] KVM: arm64: Add trap configs for PMSDSFR_EL1 James Clark
@ 2025-07-09  9:53   ` Joey Gouly
  0 siblings, 0 replies; 37+ messages in thread
From: Joey Gouly @ 2025-07-09  9:53 UTC (permalink / raw)
  To: James Clark
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, leo.yan, linux-arm-kernel, linux-kernel,
	linux-perf-users, linux-doc, kvmarm

Hi James!

On Thu, Jun 05, 2025 at 11:49:03AM +0100, James Clark wrote:
> SPE data source filtering (SPE_FEAT_FDS) adds a new register
> PMSDSFR_EL1, add the trap configs for it. PMSNEVFR_EL1 was also missing
> its VNCR offset so add it along with PMSDSFR_EL1.
> 

A (well behaving) guest won't try access this because:
The presence of PMSDSFR_EL1 is checked by accessing PMSDSFR_EL1.FDS, which is
currently `undef_access` and we hide SPE from the guest in
read_sanitised_id_aa64dfr0_el1().

So makes sense it is just undef!

Reviewed-by: Joey Gouly <joey.gouly@arm.com>

Thanks,
Joey

> Tested-by: Leo Yan <leo.yan@arm.com>
> Signed-off-by: James Clark <james.clark@linaro.org>
> ---
>  arch/arm64/include/asm/vncr_mapping.h | 2 ++
>  arch/arm64/kvm/emulate-nested.c       | 1 +
>  arch/arm64/kvm/sys_regs.c             | 1 +
>  3 files changed, 4 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/vncr_mapping.h b/arch/arm64/include/asm/vncr_mapping.h
> index 6f556e993644..dba0e58a5fac 100644
> --- a/arch/arm64/include/asm/vncr_mapping.h
> +++ b/arch/arm64/include/asm/vncr_mapping.h
> @@ -92,6 +92,8 @@
>  #define VNCR_PMSICR_EL1         0x838
>  #define VNCR_PMSIRR_EL1         0x840
>  #define VNCR_PMSLATFR_EL1       0x848
> +#define VNCR_PMSNEVFR_EL1       0x850
> +#define VNCR_PMSDSFR_EL1        0x858
>  #define VNCR_TRFCR_EL1          0x880
>  #define VNCR_MPAM1_EL1          0x900
>  #define VNCR_MPAMHCR_EL2        0x930
> diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
> index 3a384e9660b8..60bd8b7f0e5b 100644
> --- a/arch/arm64/kvm/emulate-nested.c
> +++ b/arch/arm64/kvm/emulate-nested.c
> @@ -1174,6 +1174,7 @@ static const struct encoding_to_trap_config encoding_to_cgt[] __initconst = {
>  	SR_TRAP(SYS_PMSIRR_EL1,		CGT_MDCR_TPMS),
>  	SR_TRAP(SYS_PMSLATFR_EL1,	CGT_MDCR_TPMS),
>  	SR_TRAP(SYS_PMSNEVFR_EL1,	CGT_MDCR_TPMS),
> +	SR_TRAP(SYS_PMSDSFR_EL1,	CGT_MDCR_TPMS),
>  	SR_TRAP(SYS_TRFCR_EL1,		CGT_MDCR_TTRF),
>  	SR_TRAP(SYS_TRBBASER_EL1,	CGT_MDCR_E2TB),
>  	SR_TRAP(SYS_TRBLIMITR_EL1,	CGT_MDCR_E2TB),
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index a6cf2888d150..4a88ba15c7df 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -3008,6 +3008,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	{ SYS_DESC(SYS_PMBLIMITR_EL1), undef_access },
>  	{ SYS_DESC(SYS_PMBPTR_EL1), undef_access },
>  	{ SYS_DESC(SYS_PMBSR_EL1), undef_access },
> +	{ SYS_DESC(SYS_PMSDSFR_EL1), undef_access },
>  	/* PMBIDR_EL1 is not trapped */
>  
>  	{ PMU_SYS_REG(PMINTENSET_EL1),
> 
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 02/10] perf: arm_spe: Support FEAT_SPEv1p4 filters
  2025-06-05 10:49 ` [PATCH v3 02/10] perf: arm_spe: Support FEAT_SPEv1p4 filters James Clark
  2025-06-12  7:35   ` Anshuman Khandual
@ 2025-07-14 13:26   ` Will Deacon
  2025-07-15 11:23     ` James Clark
  1 sibling, 1 reply; 37+ messages in thread
From: Will Deacon @ 2025-07-14 13:26 UTC (permalink / raw)
  To: James Clark
  Cc: Catalin Marinas, Mark Rutland, Jonathan Corbet, Marc Zyngier,
	Oliver Upton, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, leo.yan, linux-arm-kernel, linux-kernel,
	linux-perf-users, linux-doc, kvmarm

On Thu, Jun 05, 2025 at 11:49:00AM +0100, James Clark wrote:
> FEAT_SPEv1p4 (optional from Armv8.8) adds some new filter bits, so
> remove them from the previous version's RES0 bits using
> PMSEVFR_EL1_RES0_V1P4_EXCL. It also makes some previously available bits
> unavailable again, so add those back using PMSEVFR_EL1_RES0_V1P4_INCL.
> E.g:
> 
>   E[30], bit [30]
>   When FEAT_SPEv1p4 is _not_ implemented ...
> 
> FEAT_SPE_V1P3 has the same filters as V1P2 so explicitly add it to the
> switch.
> 
> Reviewed-by: Leo Yan <leo.yan@arm.com>
> Tested-by: Leo Yan <leo.yan@arm.com>
> Signed-off-by: James Clark <james.clark@linaro.org>
> ---
>  arch/arm64/include/asm/sysreg.h | 7 +++++++
>  drivers/perf/arm_spe_pmu.c      | 5 ++++-
>  2 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index f1bb0d10c39a..880090df3efc 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -358,6 +358,13 @@
>  	(PMSEVFR_EL1_RES0_IMP & ~(BIT_ULL(18) | BIT_ULL(17) | BIT_ULL(11)))
>  #define PMSEVFR_EL1_RES0_V1P2	\
>  	(PMSEVFR_EL1_RES0_V1P1 & ~BIT_ULL(6))
> +#define PMSEVFR_EL1_RES0_V1P4_EXCL \
> +	(BIT_ULL(2) | BIT_ULL(4) | GENMASK_ULL(10, 8) | GENMASK_ULL(23, 19))
> +#define PMSEVFR_EL1_RES0_V1P4_INCL \
> +	(GENMASK_ULL(31, 26))
> +#define PMSEVFR_EL1_RES0_V1P4	\
> +	(PMSEVFR_EL1_RES0_V1P4_INCL | \
> +	(PMSEVFR_EL1_RES0_V1P2 & ~PMSEVFR_EL1_RES0_V1P4_EXCL))
>  
>  /* Buffer error reporting */
>  #define PMBSR_EL1_FAULT_FSC_SHIFT	PMBSR_EL1_MSS_SHIFT
> diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
> index 3efed8839a4e..d9f6d229dce8 100644
> --- a/drivers/perf/arm_spe_pmu.c
> +++ b/drivers/perf/arm_spe_pmu.c
> @@ -701,9 +701,12 @@ static u64 arm_spe_pmsevfr_res0(u16 pmsver)
>  	case ID_AA64DFR0_EL1_PMSVer_V1P1:
>  		return PMSEVFR_EL1_RES0_V1P1;
>  	case ID_AA64DFR0_EL1_PMSVer_V1P2:
> +	case ID_AA64DFR0_EL1_PMSVer_V1P3:
> +		return PMSEVFR_EL1_RES0_V1P2;
> +	case ID_AA64DFR0_EL1_PMSVer_V1P4:
>  	/* Return the highest version we support in default */
>  	default:
> -		return PMSEVFR_EL1_RES0_V1P2;
> +		return PMSEVFR_EL1_RES0_V1P4;

See my reply [1] to Leo about this function, but I think we should just
remove it.

Will

[1] https://lore.kernel.org/all/20250707-arm_spe_support_hitm_overhead_v1_public-v3-0-33ea82da3280@arm.com/

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 01/10] arm64: sysreg: Add new PMSFCR_EL1 fields and PMSDSFR_EL1 register
  2025-06-05 10:48 ` [PATCH v3 01/10] arm64: sysreg: Add new PMSFCR_EL1 fields and PMSDSFR_EL1 register James Clark
  2025-06-12  6:44   ` Anshuman Khandual
@ 2025-07-14 13:32   ` Will Deacon
  1 sibling, 0 replies; 37+ messages in thread
From: Will Deacon @ 2025-07-14 13:32 UTC (permalink / raw)
  To: James Clark
  Cc: Catalin Marinas, Mark Rutland, Jonathan Corbet, Marc Zyngier,
	Oliver Upton, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, leo.yan, linux-arm-kernel, linux-kernel,
	linux-perf-users, linux-doc, kvmarm

On Thu, Jun 05, 2025 at 11:48:59AM +0100, James Clark wrote:
> Add new fields and register that are introduced for the features
> FEAT_SPE_EFT (extended filtering) and FEAT_SPE_FDS (data source
> filtering).
> 
> Tested-by: Leo Yan <leo.yan@arm.com>
> Signed-off-by: James Clark <james.clark@linaro.org>
> ---
>  arch/arm64/tools/sysreg | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)

Acked-by: Will Deacon <will@kernel.org>

Will

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 03/10] perf: arm_spe: Add support for FEAT_SPE_EFT extended filtering
  2025-06-05 10:49 ` [PATCH v3 03/10] perf: arm_spe: Add support for FEAT_SPE_EFT extended filtering James Clark
@ 2025-07-14 13:46   ` Will Deacon
  2025-07-15 12:39     ` James Clark
  0 siblings, 1 reply; 37+ messages in thread
From: Will Deacon @ 2025-07-14 13:46 UTC (permalink / raw)
  To: James Clark
  Cc: Catalin Marinas, Mark Rutland, Jonathan Corbet, Marc Zyngier,
	Oliver Upton, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, leo.yan, linux-arm-kernel, linux-kernel,
	linux-perf-users, linux-doc, kvmarm

On Thu, Jun 05, 2025 at 11:49:01AM +0100, James Clark wrote:
> FEAT_SPE_EFT (optional from Armv9.4) adds mask bits for the existing
> load, store and branch filters. It also adds two new filter bits for
> SIMD and floating point with their own associated mask bits. The current
> filters only allow OR filtering on samples that are load OR store etc,
> and the new mask bits allow setting part of the filter to an AND, for
> example filtering samples that are store AND SIMD. With mask bits set to
> 0, the OR behavior is preserved, so the unless any masks are explicitly
> set old filters will behave the same.
> 
> Add them all and make them behave the same way as existing format bits,
> hidden and return EOPNOTSUPP if set when the feature doesn't exist.
> 
> Reviewed-by: Leo Yan <leo.yan@arm.com>
> Tested-by: Leo Yan <leo.yan@arm.com>
> Signed-off-by: James Clark <james.clark@linaro.org>
> ---
>  drivers/perf/arm_spe_pmu.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 64 insertions(+)
> 
> diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
> index d9f6d229dce8..9309b846f642 100644
> --- a/drivers/perf/arm_spe_pmu.c
> +++ b/drivers/perf/arm_spe_pmu.c
> @@ -86,6 +86,7 @@ struct arm_spe_pmu {
>  #define SPE_PMU_FEAT_ERND			(1UL << 5)
>  #define SPE_PMU_FEAT_INV_FILT_EVT		(1UL << 6)
>  #define SPE_PMU_FEAT_DISCARD			(1UL << 7)
> +#define SPE_PMU_FEAT_EFT			(1UL << 8)
>  #define SPE_PMU_FEAT_DEV_PROBED			(1UL << 63)
>  	u64					features;
>  
> @@ -197,6 +198,27 @@ static const struct attribute_group arm_spe_pmu_cap_group = {
>  #define ATTR_CFG_FLD_discard_CFG		config	/* PMBLIMITR_EL1.FM = DISCARD */
>  #define ATTR_CFG_FLD_discard_LO			35
>  #define ATTR_CFG_FLD_discard_HI			35
> +#define ATTR_CFG_FLD_branch_filter_mask_CFG	config	/* PMSFCR_EL1.Bm */
> +#define ATTR_CFG_FLD_branch_filter_mask_LO	36
> +#define ATTR_CFG_FLD_branch_filter_mask_HI	36
> +#define ATTR_CFG_FLD_load_filter_mask_CFG	config	/* PMSFCR_EL1.LDm */
> +#define ATTR_CFG_FLD_load_filter_mask_LO	37
> +#define ATTR_CFG_FLD_load_filter_mask_HI	37
> +#define ATTR_CFG_FLD_store_filter_mask_CFG	config	/* PMSFCR_EL1.STm */
> +#define ATTR_CFG_FLD_store_filter_mask_LO	38
> +#define ATTR_CFG_FLD_store_filter_mask_HI	38
> +#define ATTR_CFG_FLD_simd_filter_CFG		config	/* PMSFCR_EL1.SIMD */
> +#define ATTR_CFG_FLD_simd_filter_LO		39
> +#define ATTR_CFG_FLD_simd_filter_HI		39
> +#define ATTR_CFG_FLD_simd_filter_mask_CFG	config	/* PMSFCR_EL1.SIMDm */
> +#define ATTR_CFG_FLD_simd_filter_mask_LO	40
> +#define ATTR_CFG_FLD_simd_filter_mask_HI	40
> +#define ATTR_CFG_FLD_float_filter_CFG		config	/* PMSFCR_EL1.FP */
> +#define ATTR_CFG_FLD_float_filter_LO		41
> +#define ATTR_CFG_FLD_float_filter_HI		41
> +#define ATTR_CFG_FLD_float_filter_mask_CFG	config	/* PMSFCR_EL1.FPm */
> +#define ATTR_CFG_FLD_float_filter_mask_LO	42
> +#define ATTR_CFG_FLD_float_filter_mask_HI	42
>  
>  #define ATTR_CFG_FLD_event_filter_CFG		config1	/* PMSEVFR_EL1 */
>  #define ATTR_CFG_FLD_event_filter_LO		0
> @@ -215,8 +237,15 @@ GEN_PMU_FORMAT_ATTR(pa_enable);
>  GEN_PMU_FORMAT_ATTR(pct_enable);
>  GEN_PMU_FORMAT_ATTR(jitter);
>  GEN_PMU_FORMAT_ATTR(branch_filter);
> +GEN_PMU_FORMAT_ATTR(branch_filter_mask);
>  GEN_PMU_FORMAT_ATTR(load_filter);
> +GEN_PMU_FORMAT_ATTR(load_filter_mask);
>  GEN_PMU_FORMAT_ATTR(store_filter);
> +GEN_PMU_FORMAT_ATTR(store_filter_mask);
> +GEN_PMU_FORMAT_ATTR(simd_filter);
> +GEN_PMU_FORMAT_ATTR(simd_filter_mask);
> +GEN_PMU_FORMAT_ATTR(float_filter);
> +GEN_PMU_FORMAT_ATTR(float_filter_mask);
>  GEN_PMU_FORMAT_ATTR(event_filter);
>  GEN_PMU_FORMAT_ATTR(inv_event_filter);
>  GEN_PMU_FORMAT_ATTR(min_latency);
> @@ -228,8 +257,15 @@ static struct attribute *arm_spe_pmu_formats_attr[] = {
>  	&format_attr_pct_enable.attr,
>  	&format_attr_jitter.attr,
>  	&format_attr_branch_filter.attr,
> +	&format_attr_branch_filter_mask.attr,
>  	&format_attr_load_filter.attr,
> +	&format_attr_load_filter_mask.attr,
>  	&format_attr_store_filter.attr,
> +	&format_attr_store_filter_mask.attr,
> +	&format_attr_simd_filter.attr,
> +	&format_attr_simd_filter_mask.attr,
> +	&format_attr_float_filter.attr,
> +	&format_attr_float_filter_mask.attr,
>  	&format_attr_event_filter.attr,
>  	&format_attr_inv_event_filter.attr,
>  	&format_attr_min_latency.attr,
> @@ -250,6 +286,16 @@ static umode_t arm_spe_pmu_format_attr_is_visible(struct kobject *kobj,
>  	if (attr == &format_attr_inv_event_filter.attr && !(spe_pmu->features & SPE_PMU_FEAT_INV_FILT_EVT))
>  		return 0;
>  
> +	if ((attr == &format_attr_branch_filter_mask.attr ||
> +	     attr == &format_attr_load_filter_mask.attr ||
> +	     attr == &format_attr_store_filter_mask.attr ||
> +	     attr == &format_attr_simd_filter.attr ||
> +	     attr == &format_attr_simd_filter_mask.attr ||
> +	     attr == &format_attr_float_filter.attr ||
> +	     attr == &format_attr_float_filter_mask.attr) &&
> +	     !(spe_pmu->features & SPE_PMU_FEAT_EFT))
> +		return 0;
> +
>  	return attr->mode;
>  }
>  
> @@ -341,8 +387,15 @@ static u64 arm_spe_event_to_pmsfcr(struct perf_event *event)
>  	u64 reg = 0;
>  
>  	reg |= FIELD_PREP(PMSFCR_EL1_LD, ATTR_CFG_GET_FLD(attr, load_filter));
> +	reg |= FIELD_PREP(PMSFCR_EL1_LDm, ATTR_CFG_GET_FLD(attr, load_filter_mask));
>  	reg |= FIELD_PREP(PMSFCR_EL1_ST, ATTR_CFG_GET_FLD(attr, store_filter));
> +	reg |= FIELD_PREP(PMSFCR_EL1_STm, ATTR_CFG_GET_FLD(attr, store_filter_mask));
>  	reg |= FIELD_PREP(PMSFCR_EL1_B, ATTR_CFG_GET_FLD(attr, branch_filter));
> +	reg |= FIELD_PREP(PMSFCR_EL1_Bm, ATTR_CFG_GET_FLD(attr, branch_filter_mask));
> +	reg |= FIELD_PREP(PMSFCR_EL1_SIMD, ATTR_CFG_GET_FLD(attr, simd_filter));
> +	reg |= FIELD_PREP(PMSFCR_EL1_SIMDm, ATTR_CFG_GET_FLD(attr, simd_filter_mask));
> +	reg |= FIELD_PREP(PMSFCR_EL1_FP, ATTR_CFG_GET_FLD(attr, float_filter));
> +	reg |= FIELD_PREP(PMSFCR_EL1_FPm, ATTR_CFG_GET_FLD(attr, float_filter_mask));
>  
>  	if (reg)
>  		reg |= PMSFCR_EL1_FT;
> @@ -716,6 +769,10 @@ static int arm_spe_pmu_event_init(struct perf_event *event)
>  	u64 reg;
>  	struct perf_event_attr *attr = &event->attr;
>  	struct arm_spe_pmu *spe_pmu = to_spe_pmu(event->pmu);
> +	const u64 feat_spe_eft_bits = PMSFCR_EL1_LDm | PMSFCR_EL1_STm |
> +				      PMSFCR_EL1_Bm | PMSFCR_EL1_SIMD |
> +				      PMSFCR_EL1_SIMDm | PMSFCR_EL1_FP |
> +				      PMSFCR_EL1_FPm;

I'm not sure this constant is adding much, especially as its defined
quite a long way from its single user.

>  
>  	/* This is, of course, deeply driver-specific */
>  	if (attr->type != event->pmu->type)
> @@ -761,6 +818,10 @@ static int arm_spe_pmu_event_init(struct perf_event *event)
>  	    !(spe_pmu->features & SPE_PMU_FEAT_FILT_LAT))
>  		return -EOPNOTSUPP;
>  
> +	if ((reg & feat_spe_eft_bits) &&
> +	    !(spe_pmu->features & SPE_PMU_FEAT_EFT))
> +		return -EOPNOTSUPP;

You could probably just spell out the entire thing tbh. It's verbose,
but it's also pretty clear and it means we have everything in one place:

	if ((FIELD_GET(PMSFCR_EL1_LDm, reg) ||
	     FIELD_GET(PMSFCR_EL1_STm, reg) ||
	     FIELD_GET(PMSFCR_EL1_Bm, reg) ||
	     FIELD_GET(PMSFCR_EL1_SIMD, reg) ||
	     FIELD_GET(PMSFCR_EL1_SIMDm, reg) ||
	     FIELD_GET(PMSFCR_EL1_FP, reg) ||
	     FIELD_GET(PMSFCR_EL1_FPm, reg)) &&
           !(spe_pmu->features & SPE_PMU_FEAT_EFT))
		return -EOPNOTSUPP;

Will

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 04/10] arm64/boot: Enable EL2 requirements for SPE_FEAT_FDS
  2025-06-05 10:49 ` [PATCH v3 04/10] arm64/boot: Enable EL2 requirements for SPE_FEAT_FDS James Clark
@ 2025-07-14 13:54   ` Will Deacon
  2025-07-15 12:48     ` James Clark
  0 siblings, 1 reply; 37+ messages in thread
From: Will Deacon @ 2025-07-14 13:54 UTC (permalink / raw)
  To: James Clark
  Cc: Catalin Marinas, Mark Rutland, Jonathan Corbet, Marc Zyngier,
	Oliver Upton, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, leo.yan, linux-arm-kernel, linux-kernel,
	linux-perf-users, linux-doc, kvmarm

On Thu, Jun 05, 2025 at 11:49:02AM +0100, James Clark wrote:
> SPE data source filtering (optional from Armv8.8) requires that traps to
> the filter register PMSDSFR be disabled. Document the requirements and
> disable the traps if the feature is present.
> 
> Tested-by: Leo Yan <leo.yan@arm.com>
> Signed-off-by: James Clark <james.clark@linaro.org>
> ---
>  Documentation/arch/arm64/booting.rst | 11 +++++++++++
>  arch/arm64/include/asm/el2_setup.h   | 14 ++++++++++++++
>  2 files changed, 25 insertions(+)
> 
> diff --git a/Documentation/arch/arm64/booting.rst b/Documentation/arch/arm64/booting.rst
> index dee7b6de864f..abd75085a239 100644
> --- a/Documentation/arch/arm64/booting.rst
> +++ b/Documentation/arch/arm64/booting.rst
> @@ -404,6 +404,17 @@ Before jumping into the kernel, the following conditions must be met:
>      - HDFGWTR2_EL2.nPMICFILTR_EL0 (bit 3) must be initialised to 0b1.
>      - HDFGWTR2_EL2.nPMUACR_EL1 (bit 4) must be initialised to 0b1.
>  
> +  For CPUs with SPE data source filtering (FEAT_SPE_FDS):
> +
> +  - If EL3 is present:
> +
> +    - MDCR_EL3.EnPMS3 (bit 42) must be initialised to 0b1.
> +
> +  - If the kernel is entered at EL1 and EL2 is present:
> +
> +    - HDFGRTR2_EL2.nPMSDSFR_EL1 (bit 19) must be initialised to 0b1.
> +    - HDFGWTR2_EL2.nPMSDSFR_EL1 (bit 19) must be initialised to 0b1.
> +
>    For CPUs with Memory Copy and Memory Set instructions (FEAT_MOPS):
>  
>    - If the kernel is entered at EL1 and EL2 is present:
> diff --git a/arch/arm64/include/asm/el2_setup.h b/arch/arm64/include/asm/el2_setup.h
> index 1e7c7475e43f..02b4a7fc016e 100644
> --- a/arch/arm64/include/asm/el2_setup.h
> +++ b/arch/arm64/include/asm/el2_setup.h
> @@ -279,6 +279,20 @@
>  	orr	x0, x0, #HDFGRTR2_EL2_nPMICFILTR_EL0
>  	orr	x0, x0, #HDFGRTR2_EL2_nPMUACR_EL1
>  .Lskip_pmuv3p9_\@:
> +	mrs	x1, id_aa64dfr0_el1
> +	ubfx	x1, x1, #ID_AA64DFR0_EL1_PMSVer_SHIFT, #4
> +	/* If SPE is implemented, */
> +	cmp	x1, #ID_AA64DFR0_EL1_PMSVer_IMP
> +	b.lt	.Lskip_spefds_\@
> +	/* we can read PMSIDR and */
> +	mrs_s	x1, SYS_PMSIDR_EL1
> +	and	x1, x1,  #PMSIDR_EL1_FDS
> +	/* if FEAT_SPE_FDS is implemented, */
> +	cbz	x1, .Lskip_spefds_\@
> +	/* disable traps to PMSDSFR. */
> +	orr	x0, x0, #HDFGRTR2_EL2_nPMSDSFR_EL1

Why is this being done here rather than alongside the existing SPE
configuration of HDFGRTR_EL2 and HDFGWTR_EL2 near the start of
__init_el2_fgt?

Will

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 06/10] perf: Add perf_event_attr::config4
  2025-06-05 10:49 ` [PATCH v3 06/10] perf: Add perf_event_attr::config4 James Clark
  2025-06-30 15:35   ` Ian Rogers
@ 2025-07-14 13:56   ` Will Deacon
  1 sibling, 0 replies; 37+ messages in thread
From: Will Deacon @ 2025-07-14 13:56 UTC (permalink / raw)
  To: James Clark
  Cc: Catalin Marinas, Mark Rutland, Jonathan Corbet, Marc Zyngier,
	Oliver Upton, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, leo.yan, linux-arm-kernel, linux-kernel,
	linux-perf-users, linux-doc, kvmarm

On Thu, Jun 05, 2025 at 11:49:04AM +0100, James Clark wrote:
> Arm FEAT_SPE_FDS adds the ability to filter on the data source of a
> packet using another 64-bits of event filtering control. As the existing
> perf_event_attr::configN fields are all used up for SPE PMU, an
> additional field is needed. Add a new 'config4' field.
> 
> Reviewed-by: Leo Yan <leo.yan@arm.com>
> Tested-by: Leo Yan <leo.yan@arm.com>
> Signed-off-by: James Clark <james.clark@linaro.org>
> ---
>  include/uapi/linux/perf_event.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index 78a362b80027..0d0ed85ad8cb 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -382,6 +382,7 @@ enum perf_event_read_format {
>  #define PERF_ATTR_SIZE_VER6			120	/* Add: aux_sample_size */
>  #define PERF_ATTR_SIZE_VER7			128	/* Add: sig_data */
>  #define PERF_ATTR_SIZE_VER8			136	/* Add: config3 */
> +#define PERF_ATTR_SIZE_VER9			144	/* add: config4 */
>  
>  /*
>   * 'struct perf_event_attr' contains various attributes that define
> @@ -543,6 +544,7 @@ struct perf_event_attr {
>  	__u64	sig_data;
>  
>  	__u64	config3; /* extension of config2 */
> +	__u64	config4; /* extension of config3 */
>  };

Looks straightforward to me, but this will need an Ack from one of the perf
core maintainers.

Will

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 07/10] perf: arm_spe: Add support for filtering on data source
  2025-06-05 10:49 ` [PATCH v3 07/10] perf: arm_spe: Add support for filtering on data source James Clark
@ 2025-07-14 14:04   ` Will Deacon
  2025-07-15 13:04     ` James Clark
  0 siblings, 1 reply; 37+ messages in thread
From: Will Deacon @ 2025-07-14 14:04 UTC (permalink / raw)
  To: James Clark
  Cc: Catalin Marinas, Mark Rutland, Jonathan Corbet, Marc Zyngier,
	Oliver Upton, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, leo.yan, linux-arm-kernel, linux-kernel,
	linux-perf-users, linux-doc, kvmarm

On Thu, Jun 05, 2025 at 11:49:05AM +0100, James Clark wrote:
> SPE_FEAT_FDS adds the ability to filter on the data source of packets.
> Like the other existing filters, enable filtering with PMSFCR_EL1.FDS
> when any of the filter bits are set.
> 
> Each bit maps to data sources 0-63 described by bits[0:5] in the data
> source packet (although the full range of data source is 16 bits so
> higher value data sources can't be filtered on). The filter is an OR of
> all the bits, so for example setting bits 0 and 3 filters packets from
> data sources 0 OR 3.
> 
> Reviewed-by: Leo Yan <leo.yan@arm.com>
> Tested-by: Leo Yan <leo.yan@arm.com>
> Signed-off-by: James Clark <james.clark@linaro.org>
> ---
>  drivers/perf/arm_spe_pmu.c | 31 +++++++++++++++++++++++++++++++
>  1 file changed, 31 insertions(+)
> 
> diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
> index 9309b846f642..d04318411f77 100644
> --- a/drivers/perf/arm_spe_pmu.c
> +++ b/drivers/perf/arm_spe_pmu.c
> @@ -87,6 +87,7 @@ struct arm_spe_pmu {
>  #define SPE_PMU_FEAT_INV_FILT_EVT		(1UL << 6)
>  #define SPE_PMU_FEAT_DISCARD			(1UL << 7)
>  #define SPE_PMU_FEAT_EFT			(1UL << 8)
> +#define SPE_PMU_FEAT_FDS			(1UL << 9)
>  #define SPE_PMU_FEAT_DEV_PROBED			(1UL << 63)
>  	u64					features;
>  
> @@ -232,6 +233,10 @@ static const struct attribute_group arm_spe_pmu_cap_group = {
>  #define ATTR_CFG_FLD_inv_event_filter_LO	0
>  #define ATTR_CFG_FLD_inv_event_filter_HI	63
>  
> +#define ATTR_CFG_FLD_data_src_filter_CFG	config4	/* PMSDSFR_EL1 */
> +#define ATTR_CFG_FLD_data_src_filter_LO	0
> +#define ATTR_CFG_FLD_data_src_filter_HI	63
> +
>  GEN_PMU_FORMAT_ATTR(ts_enable);
>  GEN_PMU_FORMAT_ATTR(pa_enable);
>  GEN_PMU_FORMAT_ATTR(pct_enable);
> @@ -248,6 +253,7 @@ GEN_PMU_FORMAT_ATTR(float_filter);
>  GEN_PMU_FORMAT_ATTR(float_filter_mask);
>  GEN_PMU_FORMAT_ATTR(event_filter);
>  GEN_PMU_FORMAT_ATTR(inv_event_filter);
> +GEN_PMU_FORMAT_ATTR(data_src_filter);
>  GEN_PMU_FORMAT_ATTR(min_latency);
>  GEN_PMU_FORMAT_ATTR(discard);
>  
> @@ -268,6 +274,7 @@ static struct attribute *arm_spe_pmu_formats_attr[] = {
>  	&format_attr_float_filter_mask.attr,
>  	&format_attr_event_filter.attr,
>  	&format_attr_inv_event_filter.attr,
> +	&format_attr_data_src_filter.attr,
>  	&format_attr_min_latency.attr,
>  	&format_attr_discard.attr,
>  	NULL,
> @@ -286,6 +293,9 @@ static umode_t arm_spe_pmu_format_attr_is_visible(struct kobject *kobj,
>  	if (attr == &format_attr_inv_event_filter.attr && !(spe_pmu->features & SPE_PMU_FEAT_INV_FILT_EVT))
>  		return 0;
>  
> +	if (attr == &format_attr_data_src_filter.attr && !(spe_pmu->features & SPE_PMU_FEAT_FDS))
> +		return 0;
> +
>  	if ((attr == &format_attr_branch_filter_mask.attr ||
>  	     attr == &format_attr_load_filter_mask.attr ||
>  	     attr == &format_attr_store_filter_mask.attr ||
> @@ -406,6 +416,9 @@ static u64 arm_spe_event_to_pmsfcr(struct perf_event *event)
>  	if (ATTR_CFG_GET_FLD(attr, inv_event_filter))
>  		reg |= PMSFCR_EL1_FnE;
>  
> +	if (ATTR_CFG_GET_FLD(attr, data_src_filter))
> +		reg |= PMSFCR_EL1_FDS;

Is the polarity correct here? The description of PMSDSFR_EL1.S<m> suggests
that setting bits to 1 _excludes_ the FDS filtering.

>  	if (ATTR_CFG_GET_FLD(attr, min_latency))
>  		reg |= PMSFCR_EL1_FL;
>  
> @@ -430,6 +443,12 @@ static u64 arm_spe_event_to_pmslatfr(struct perf_event *event)
>  	return FIELD_PREP(PMSLATFR_EL1_MINLAT, ATTR_CFG_GET_FLD(attr, min_latency));
>  }
>  
> +static u64 arm_spe_event_to_pmsdsfr(struct perf_event *event)
> +{
> +	struct perf_event_attr *attr = &event->attr;
> +	return ATTR_CFG_GET_FLD(attr, data_src_filter);
> +}
> +
>  static void arm_spe_pmu_pad_buf(struct perf_output_handle *handle, int len)
>  {
>  	struct arm_spe_pmu_buf *buf = perf_get_aux(handle);
> @@ -788,6 +807,10 @@ static int arm_spe_pmu_event_init(struct perf_event *event)
>  	if (arm_spe_event_to_pmsnevfr(event) & arm_spe_pmsevfr_res0(spe_pmu->pmsver))
>  		return -EOPNOTSUPP;
>  
> +	if (arm_spe_event_to_pmsdsfr(event) &&
> +	    !(spe_pmu->features & SPE_PMU_FEAT_FDS))
> +		return -EOPNOTSUPP;

Same question here.

Will

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 02/10] perf: arm_spe: Support FEAT_SPEv1p4 filters
  2025-07-14 13:26   ` Will Deacon
@ 2025-07-15 11:23     ` James Clark
  0 siblings, 0 replies; 37+ messages in thread
From: James Clark @ 2025-07-15 11:23 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, Mark Rutland, Jonathan Corbet, Marc Zyngier,
	Oliver Upton, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, leo.yan, linux-arm-kernel, linux-kernel,
	linux-perf-users, linux-doc, kvmarm



On 14/07/2025 2:26 pm, Will Deacon wrote:
> On Thu, Jun 05, 2025 at 11:49:00AM +0100, James Clark wrote:
>> FEAT_SPEv1p4 (optional from Armv8.8) adds some new filter bits, so
>> remove them from the previous version's RES0 bits using
>> PMSEVFR_EL1_RES0_V1P4_EXCL. It also makes some previously available bits
>> unavailable again, so add those back using PMSEVFR_EL1_RES0_V1P4_INCL.
>> E.g:
>>
>>    E[30], bit [30]
>>    When FEAT_SPEv1p4 is _not_ implemented ...
>>
>> FEAT_SPE_V1P3 has the same filters as V1P2 so explicitly add it to the
>> switch.
>>
>> Reviewed-by: Leo Yan <leo.yan@arm.com>
>> Tested-by: Leo Yan <leo.yan@arm.com>
>> Signed-off-by: James Clark <james.clark@linaro.org>
>> ---
>>   arch/arm64/include/asm/sysreg.h | 7 +++++++
>>   drivers/perf/arm_spe_pmu.c      | 5 ++++-
>>   2 files changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
>> index f1bb0d10c39a..880090df3efc 100644
>> --- a/arch/arm64/include/asm/sysreg.h
>> +++ b/arch/arm64/include/asm/sysreg.h
>> @@ -358,6 +358,13 @@
>>   	(PMSEVFR_EL1_RES0_IMP & ~(BIT_ULL(18) | BIT_ULL(17) | BIT_ULL(11)))
>>   #define PMSEVFR_EL1_RES0_V1P2	\
>>   	(PMSEVFR_EL1_RES0_V1P1 & ~BIT_ULL(6))
>> +#define PMSEVFR_EL1_RES0_V1P4_EXCL \
>> +	(BIT_ULL(2) | BIT_ULL(4) | GENMASK_ULL(10, 8) | GENMASK_ULL(23, 19))
>> +#define PMSEVFR_EL1_RES0_V1P4_INCL \
>> +	(GENMASK_ULL(31, 26))
>> +#define PMSEVFR_EL1_RES0_V1P4	\
>> +	(PMSEVFR_EL1_RES0_V1P4_INCL | \
>> +	(PMSEVFR_EL1_RES0_V1P2 & ~PMSEVFR_EL1_RES0_V1P4_EXCL))
>>   
>>   /* Buffer error reporting */
>>   #define PMBSR_EL1_FAULT_FSC_SHIFT	PMBSR_EL1_MSS_SHIFT
>> diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
>> index 3efed8839a4e..d9f6d229dce8 100644
>> --- a/drivers/perf/arm_spe_pmu.c
>> +++ b/drivers/perf/arm_spe_pmu.c
>> @@ -701,9 +701,12 @@ static u64 arm_spe_pmsevfr_res0(u16 pmsver)
>>   	case ID_AA64DFR0_EL1_PMSVer_V1P1:
>>   		return PMSEVFR_EL1_RES0_V1P1;
>>   	case ID_AA64DFR0_EL1_PMSVer_V1P2:
>> +	case ID_AA64DFR0_EL1_PMSVer_V1P3:
>> +		return PMSEVFR_EL1_RES0_V1P2;
>> +	case ID_AA64DFR0_EL1_PMSVer_V1P4:
>>   	/* Return the highest version we support in default */
>>   	default:
>> -		return PMSEVFR_EL1_RES0_V1P2;
>> +		return PMSEVFR_EL1_RES0_V1P4;
> 
> See my reply [1] to Leo about this function, but I think we should just
> remove it.
> 
> Will
> 
> [1] https://lore.kernel.org/all/20250707-arm_spe_support_hitm_overhead_v1_public-v3-0-33ea82da3280@arm.com/

We're only refusing filters that we know for sure are RES0. Unless 
there's a mistake, the ones that are maybes are still up to userspace to 
decide whether it wants to use them or not.

I think it could be quite useful for some automated tool to fall back to 
another behavior if it needs an event that isn't implemented.

If they were _all_ defined as maybes like "When FEAT_SPEv1p4 is 
implemented or filtering on event 2 is optionally supported" then I 
would agree it's not definite enough to bother restricting them. But a 
lot of them are known for sure like "When FEAT_SPEv1p4 is not 
implemented ...", so I don't see the harm in preventing use of those.

Or as I mentioned in the other thread if we think we can probe the valid 
filters that would be even better.

Thanks
James


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 03/10] perf: arm_spe: Add support for FEAT_SPE_EFT extended filtering
  2025-07-14 13:46   ` Will Deacon
@ 2025-07-15 12:39     ` James Clark
  0 siblings, 0 replies; 37+ messages in thread
From: James Clark @ 2025-07-15 12:39 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, Mark Rutland, Jonathan Corbet, Marc Zyngier,
	Oliver Upton, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, leo.yan, linux-arm-kernel, linux-kernel,
	linux-perf-users, linux-doc, kvmarm



On 14/07/2025 2:46 pm, Will Deacon wrote:
> On Thu, Jun 05, 2025 at 11:49:01AM +0100, James Clark wrote:
>> FEAT_SPE_EFT (optional from Armv9.4) adds mask bits for the existing
>> load, store and branch filters. It also adds two new filter bits for
>> SIMD and floating point with their own associated mask bits. The current
>> filters only allow OR filtering on samples that are load OR store etc,
>> and the new mask bits allow setting part of the filter to an AND, for
>> example filtering samples that are store AND SIMD. With mask bits set to
>> 0, the OR behavior is preserved, so the unless any masks are explicitly
>> set old filters will behave the same.
>>
>> Add them all and make them behave the same way as existing format bits,
>> hidden and return EOPNOTSUPP if set when the feature doesn't exist.
>>
>> Reviewed-by: Leo Yan <leo.yan@arm.com>
>> Tested-by: Leo Yan <leo.yan@arm.com>
>> Signed-off-by: James Clark <james.clark@linaro.org>
>> ---
>>   drivers/perf/arm_spe_pmu.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 64 insertions(+)
>>
>> diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
>> index d9f6d229dce8..9309b846f642 100644
>> --- a/drivers/perf/arm_spe_pmu.c
>> +++ b/drivers/perf/arm_spe_pmu.c
>> @@ -86,6 +86,7 @@ struct arm_spe_pmu {
>>   #define SPE_PMU_FEAT_ERND			(1UL << 5)
>>   #define SPE_PMU_FEAT_INV_FILT_EVT		(1UL << 6)
>>   #define SPE_PMU_FEAT_DISCARD			(1UL << 7)
>> +#define SPE_PMU_FEAT_EFT			(1UL << 8)
>>   #define SPE_PMU_FEAT_DEV_PROBED			(1UL << 63)
>>   	u64					features;
>>   
>> @@ -197,6 +198,27 @@ static const struct attribute_group arm_spe_pmu_cap_group = {
>>   #define ATTR_CFG_FLD_discard_CFG		config	/* PMBLIMITR_EL1.FM = DISCARD */
>>   #define ATTR_CFG_FLD_discard_LO			35
>>   #define ATTR_CFG_FLD_discard_HI			35
>> +#define ATTR_CFG_FLD_branch_filter_mask_CFG	config	/* PMSFCR_EL1.Bm */
>> +#define ATTR_CFG_FLD_branch_filter_mask_LO	36
>> +#define ATTR_CFG_FLD_branch_filter_mask_HI	36
>> +#define ATTR_CFG_FLD_load_filter_mask_CFG	config	/* PMSFCR_EL1.LDm */
>> +#define ATTR_CFG_FLD_load_filter_mask_LO	37
>> +#define ATTR_CFG_FLD_load_filter_mask_HI	37
>> +#define ATTR_CFG_FLD_store_filter_mask_CFG	config	/* PMSFCR_EL1.STm */
>> +#define ATTR_CFG_FLD_store_filter_mask_LO	38
>> +#define ATTR_CFG_FLD_store_filter_mask_HI	38
>> +#define ATTR_CFG_FLD_simd_filter_CFG		config	/* PMSFCR_EL1.SIMD */
>> +#define ATTR_CFG_FLD_simd_filter_LO		39
>> +#define ATTR_CFG_FLD_simd_filter_HI		39
>> +#define ATTR_CFG_FLD_simd_filter_mask_CFG	config	/* PMSFCR_EL1.SIMDm */
>> +#define ATTR_CFG_FLD_simd_filter_mask_LO	40
>> +#define ATTR_CFG_FLD_simd_filter_mask_HI	40
>> +#define ATTR_CFG_FLD_float_filter_CFG		config	/* PMSFCR_EL1.FP */
>> +#define ATTR_CFG_FLD_float_filter_LO		41
>> +#define ATTR_CFG_FLD_float_filter_HI		41
>> +#define ATTR_CFG_FLD_float_filter_mask_CFG	config	/* PMSFCR_EL1.FPm */
>> +#define ATTR_CFG_FLD_float_filter_mask_LO	42
>> +#define ATTR_CFG_FLD_float_filter_mask_HI	42
>>   
>>   #define ATTR_CFG_FLD_event_filter_CFG		config1	/* PMSEVFR_EL1 */
>>   #define ATTR_CFG_FLD_event_filter_LO		0
>> @@ -215,8 +237,15 @@ GEN_PMU_FORMAT_ATTR(pa_enable);
>>   GEN_PMU_FORMAT_ATTR(pct_enable);
>>   GEN_PMU_FORMAT_ATTR(jitter);
>>   GEN_PMU_FORMAT_ATTR(branch_filter);
>> +GEN_PMU_FORMAT_ATTR(branch_filter_mask);
>>   GEN_PMU_FORMAT_ATTR(load_filter);
>> +GEN_PMU_FORMAT_ATTR(load_filter_mask);
>>   GEN_PMU_FORMAT_ATTR(store_filter);
>> +GEN_PMU_FORMAT_ATTR(store_filter_mask);
>> +GEN_PMU_FORMAT_ATTR(simd_filter);
>> +GEN_PMU_FORMAT_ATTR(simd_filter_mask);
>> +GEN_PMU_FORMAT_ATTR(float_filter);
>> +GEN_PMU_FORMAT_ATTR(float_filter_mask);
>>   GEN_PMU_FORMAT_ATTR(event_filter);
>>   GEN_PMU_FORMAT_ATTR(inv_event_filter);
>>   GEN_PMU_FORMAT_ATTR(min_latency);
>> @@ -228,8 +257,15 @@ static struct attribute *arm_spe_pmu_formats_attr[] = {
>>   	&format_attr_pct_enable.attr,
>>   	&format_attr_jitter.attr,
>>   	&format_attr_branch_filter.attr,
>> +	&format_attr_branch_filter_mask.attr,
>>   	&format_attr_load_filter.attr,
>> +	&format_attr_load_filter_mask.attr,
>>   	&format_attr_store_filter.attr,
>> +	&format_attr_store_filter_mask.attr,
>> +	&format_attr_simd_filter.attr,
>> +	&format_attr_simd_filter_mask.attr,
>> +	&format_attr_float_filter.attr,
>> +	&format_attr_float_filter_mask.attr,
>>   	&format_attr_event_filter.attr,
>>   	&format_attr_inv_event_filter.attr,
>>   	&format_attr_min_latency.attr,
>> @@ -250,6 +286,16 @@ static umode_t arm_spe_pmu_format_attr_is_visible(struct kobject *kobj,
>>   	if (attr == &format_attr_inv_event_filter.attr && !(spe_pmu->features & SPE_PMU_FEAT_INV_FILT_EVT))
>>   		return 0;
>>   
>> +	if ((attr == &format_attr_branch_filter_mask.attr ||
>> +	     attr == &format_attr_load_filter_mask.attr ||
>> +	     attr == &format_attr_store_filter_mask.attr ||
>> +	     attr == &format_attr_simd_filter.attr ||
>> +	     attr == &format_attr_simd_filter_mask.attr ||
>> +	     attr == &format_attr_float_filter.attr ||
>> +	     attr == &format_attr_float_filter_mask.attr) &&
>> +	     !(spe_pmu->features & SPE_PMU_FEAT_EFT))
>> +		return 0;
>> +
>>   	return attr->mode;
>>   }
>>   
>> @@ -341,8 +387,15 @@ static u64 arm_spe_event_to_pmsfcr(struct perf_event *event)
>>   	u64 reg = 0;
>>   
>>   	reg |= FIELD_PREP(PMSFCR_EL1_LD, ATTR_CFG_GET_FLD(attr, load_filter));
>> +	reg |= FIELD_PREP(PMSFCR_EL1_LDm, ATTR_CFG_GET_FLD(attr, load_filter_mask));
>>   	reg |= FIELD_PREP(PMSFCR_EL1_ST, ATTR_CFG_GET_FLD(attr, store_filter));
>> +	reg |= FIELD_PREP(PMSFCR_EL1_STm, ATTR_CFG_GET_FLD(attr, store_filter_mask));
>>   	reg |= FIELD_PREP(PMSFCR_EL1_B, ATTR_CFG_GET_FLD(attr, branch_filter));
>> +	reg |= FIELD_PREP(PMSFCR_EL1_Bm, ATTR_CFG_GET_FLD(attr, branch_filter_mask));
>> +	reg |= FIELD_PREP(PMSFCR_EL1_SIMD, ATTR_CFG_GET_FLD(attr, simd_filter));
>> +	reg |= FIELD_PREP(PMSFCR_EL1_SIMDm, ATTR_CFG_GET_FLD(attr, simd_filter_mask));
>> +	reg |= FIELD_PREP(PMSFCR_EL1_FP, ATTR_CFG_GET_FLD(attr, float_filter));
>> +	reg |= FIELD_PREP(PMSFCR_EL1_FPm, ATTR_CFG_GET_FLD(attr, float_filter_mask));
>>   
>>   	if (reg)
>>   		reg |= PMSFCR_EL1_FT;
>> @@ -716,6 +769,10 @@ static int arm_spe_pmu_event_init(struct perf_event *event)
>>   	u64 reg;
>>   	struct perf_event_attr *attr = &event->attr;
>>   	struct arm_spe_pmu *spe_pmu = to_spe_pmu(event->pmu);
>> +	const u64 feat_spe_eft_bits = PMSFCR_EL1_LDm | PMSFCR_EL1_STm |
>> +				      PMSFCR_EL1_Bm | PMSFCR_EL1_SIMD |
>> +				      PMSFCR_EL1_SIMDm | PMSFCR_EL1_FP |
>> +				      PMSFCR_EL1_FPm;
> 
> I'm not sure this constant is adding much, especially as its defined
> quite a long way from its single user.
> 
>>   
>>   	/* This is, of course, deeply driver-specific */
>>   	if (attr->type != event->pmu->type)
>> @@ -761,6 +818,10 @@ static int arm_spe_pmu_event_init(struct perf_event *event)
>>   	    !(spe_pmu->features & SPE_PMU_FEAT_FILT_LAT))
>>   		return -EOPNOTSUPP;
>>   
>> +	if ((reg & feat_spe_eft_bits) &&
>> +	    !(spe_pmu->features & SPE_PMU_FEAT_EFT))
>> +		return -EOPNOTSUPP;
> 
> You could probably just spell out the entire thing tbh. It's verbose,
> but it's also pretty clear and it means we have everything in one place:
> 
> 	if ((FIELD_GET(PMSFCR_EL1_LDm, reg) ||
> 	     FIELD_GET(PMSFCR_EL1_STm, reg) ||
> 	     FIELD_GET(PMSFCR_EL1_Bm, reg) ||
> 	     FIELD_GET(PMSFCR_EL1_SIMD, reg) ||
> 	     FIELD_GET(PMSFCR_EL1_SIMDm, reg) ||
> 	     FIELD_GET(PMSFCR_EL1_FP, reg) ||
> 	     FIELD_GET(PMSFCR_EL1_FPm, reg)) &&
>             !(spe_pmu->features & SPE_PMU_FEAT_EFT))
> 		return -EOPNOTSUPP;
> 
> Will

Ack


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 04/10] arm64/boot: Enable EL2 requirements for SPE_FEAT_FDS
  2025-07-14 13:54   ` Will Deacon
@ 2025-07-15 12:48     ` James Clark
  2025-07-15 12:57       ` Will Deacon
  0 siblings, 1 reply; 37+ messages in thread
From: James Clark @ 2025-07-15 12:48 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, Mark Rutland, Jonathan Corbet, Marc Zyngier,
	Oliver Upton, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, leo.yan, linux-arm-kernel, linux-kernel,
	linux-perf-users, linux-doc, kvmarm



On 14/07/2025 2:54 pm, Will Deacon wrote:
> On Thu, Jun 05, 2025 at 11:49:02AM +0100, James Clark wrote:
>> SPE data source filtering (optional from Armv8.8) requires that traps to
>> the filter register PMSDSFR be disabled. Document the requirements and
>> disable the traps if the feature is present.
>>
>> Tested-by: Leo Yan <leo.yan@arm.com>
>> Signed-off-by: James Clark <james.clark@linaro.org>
>> ---
>>   Documentation/arch/arm64/booting.rst | 11 +++++++++++
>>   arch/arm64/include/asm/el2_setup.h   | 14 ++++++++++++++
>>   2 files changed, 25 insertions(+)
>>
>> diff --git a/Documentation/arch/arm64/booting.rst b/Documentation/arch/arm64/booting.rst
>> index dee7b6de864f..abd75085a239 100644
>> --- a/Documentation/arch/arm64/booting.rst
>> +++ b/Documentation/arch/arm64/booting.rst
>> @@ -404,6 +404,17 @@ Before jumping into the kernel, the following conditions must be met:
>>       - HDFGWTR2_EL2.nPMICFILTR_EL0 (bit 3) must be initialised to 0b1.
>>       - HDFGWTR2_EL2.nPMUACR_EL1 (bit 4) must be initialised to 0b1.
>>   
>> +  For CPUs with SPE data source filtering (FEAT_SPE_FDS):
>> +
>> +  - If EL3 is present:
>> +
>> +    - MDCR_EL3.EnPMS3 (bit 42) must be initialised to 0b1.
>> +
>> +  - If the kernel is entered at EL1 and EL2 is present:
>> +
>> +    - HDFGRTR2_EL2.nPMSDSFR_EL1 (bit 19) must be initialised to 0b1.
>> +    - HDFGWTR2_EL2.nPMSDSFR_EL1 (bit 19) must be initialised to 0b1.
>> +
>>     For CPUs with Memory Copy and Memory Set instructions (FEAT_MOPS):
>>   
>>     - If the kernel is entered at EL1 and EL2 is present:
>> diff --git a/arch/arm64/include/asm/el2_setup.h b/arch/arm64/include/asm/el2_setup.h
>> index 1e7c7475e43f..02b4a7fc016e 100644
>> --- a/arch/arm64/include/asm/el2_setup.h
>> +++ b/arch/arm64/include/asm/el2_setup.h
>> @@ -279,6 +279,20 @@
>>   	orr	x0, x0, #HDFGRTR2_EL2_nPMICFILTR_EL0
>>   	orr	x0, x0, #HDFGRTR2_EL2_nPMUACR_EL1
>>   .Lskip_pmuv3p9_\@:
>> +	mrs	x1, id_aa64dfr0_el1
>> +	ubfx	x1, x1, #ID_AA64DFR0_EL1_PMSVer_SHIFT, #4
>> +	/* If SPE is implemented, */
>> +	cmp	x1, #ID_AA64DFR0_EL1_PMSVer_IMP
>> +	b.lt	.Lskip_spefds_\@
>> +	/* we can read PMSIDR and */
>> +	mrs_s	x1, SYS_PMSIDR_EL1
>> +	and	x1, x1,  #PMSIDR_EL1_FDS
>> +	/* if FEAT_SPE_FDS is implemented, */
>> +	cbz	x1, .Lskip_spefds_\@
>> +	/* disable traps to PMSDSFR. */
>> +	orr	x0, x0, #HDFGRTR2_EL2_nPMSDSFR_EL1
> 
> Why is this being done here rather than alongside the existing SPE
> configuration of HDFGRTR_EL2 and HDFGWTR_EL2 near the start of
> __init_el2_fgt?
> 
> Will

I thought everything was separated by which trap configs it writes to, 
rather than the feature. This SPE feature is in HDFGRTR2 so I put it in 
__init_el2_fgt2 rather than __init_el2_fgt.

I suppose we could have a single __init_el2_spe that writes to both 
HDFGRTR and HDFGRTR2 but we'd have to be careful to not overwrite what 
was already done in the other sections.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 04/10] arm64/boot: Enable EL2 requirements for SPE_FEAT_FDS
  2025-07-15 12:48     ` James Clark
@ 2025-07-15 12:57       ` Will Deacon
  2025-07-15 13:10         ` James Clark
  0 siblings, 1 reply; 37+ messages in thread
From: Will Deacon @ 2025-07-15 12:57 UTC (permalink / raw)
  To: James Clark
  Cc: Catalin Marinas, Mark Rutland, Jonathan Corbet, Marc Zyngier,
	Oliver Upton, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, leo.yan, linux-arm-kernel, linux-kernel,
	linux-perf-users, linux-doc, kvmarm

On Tue, Jul 15, 2025 at 01:48:03PM +0100, James Clark wrote:
> 
> 
> On 14/07/2025 2:54 pm, Will Deacon wrote:
> > On Thu, Jun 05, 2025 at 11:49:02AM +0100, James Clark wrote:
> > > SPE data source filtering (optional from Armv8.8) requires that traps to
> > > the filter register PMSDSFR be disabled. Document the requirements and
> > > disable the traps if the feature is present.
> > > 
> > > Tested-by: Leo Yan <leo.yan@arm.com>
> > > Signed-off-by: James Clark <james.clark@linaro.org>
> > > ---
> > >   Documentation/arch/arm64/booting.rst | 11 +++++++++++
> > >   arch/arm64/include/asm/el2_setup.h   | 14 ++++++++++++++
> > >   2 files changed, 25 insertions(+)
> > > 
> > > diff --git a/Documentation/arch/arm64/booting.rst b/Documentation/arch/arm64/booting.rst
> > > index dee7b6de864f..abd75085a239 100644
> > > --- a/Documentation/arch/arm64/booting.rst
> > > +++ b/Documentation/arch/arm64/booting.rst
> > > @@ -404,6 +404,17 @@ Before jumping into the kernel, the following conditions must be met:
> > >       - HDFGWTR2_EL2.nPMICFILTR_EL0 (bit 3) must be initialised to 0b1.
> > >       - HDFGWTR2_EL2.nPMUACR_EL1 (bit 4) must be initialised to 0b1.
> > > +  For CPUs with SPE data source filtering (FEAT_SPE_FDS):
> > > +
> > > +  - If EL3 is present:
> > > +
> > > +    - MDCR_EL3.EnPMS3 (bit 42) must be initialised to 0b1.
> > > +
> > > +  - If the kernel is entered at EL1 and EL2 is present:
> > > +
> > > +    - HDFGRTR2_EL2.nPMSDSFR_EL1 (bit 19) must be initialised to 0b1.
> > > +    - HDFGWTR2_EL2.nPMSDSFR_EL1 (bit 19) must be initialised to 0b1.
> > > +
> > >     For CPUs with Memory Copy and Memory Set instructions (FEAT_MOPS):
> > >     - If the kernel is entered at EL1 and EL2 is present:
> > > diff --git a/arch/arm64/include/asm/el2_setup.h b/arch/arm64/include/asm/el2_setup.h
> > > index 1e7c7475e43f..02b4a7fc016e 100644
> > > --- a/arch/arm64/include/asm/el2_setup.h
> > > +++ b/arch/arm64/include/asm/el2_setup.h
> > > @@ -279,6 +279,20 @@
> > >   	orr	x0, x0, #HDFGRTR2_EL2_nPMICFILTR_EL0
> > >   	orr	x0, x0, #HDFGRTR2_EL2_nPMUACR_EL1
> > >   .Lskip_pmuv3p9_\@:
> > > +	mrs	x1, id_aa64dfr0_el1
> > > +	ubfx	x1, x1, #ID_AA64DFR0_EL1_PMSVer_SHIFT, #4
> > > +	/* If SPE is implemented, */
> > > +	cmp	x1, #ID_AA64DFR0_EL1_PMSVer_IMP
> > > +	b.lt	.Lskip_spefds_\@
> > > +	/* we can read PMSIDR and */
> > > +	mrs_s	x1, SYS_PMSIDR_EL1
> > > +	and	x1, x1,  #PMSIDR_EL1_FDS
> > > +	/* if FEAT_SPE_FDS is implemented, */
> > > +	cbz	x1, .Lskip_spefds_\@
> > > +	/* disable traps to PMSDSFR. */
> > > +	orr	x0, x0, #HDFGRTR2_EL2_nPMSDSFR_EL1
> > 
> > Why is this being done here rather than alongside the existing SPE
> > configuration of HDFGRTR_EL2 and HDFGWTR_EL2 near the start of
> > __init_el2_fgt?
> > 
> I thought everything was separated by which trap configs it writes to,
> rather than the feature. This SPE feature is in HDFGRTR2 so I put it in
> __init_el2_fgt2 rather than __init_el2_fgt.

That's fair; __init_el2_fgt isn't the right place. But the redundancy of
re-reading PMSVer from DFR0 is a little jarring.

> I suppose we could have a single __init_el2_spe that writes to both HDFGRTR
> and HDFGRTR2 but we'd have to be careful to not overwrite what was already
> done in the other sections.

Right, perhaps it would be clearer to have trap-preserving macros for
features in a specific ID register rather than per-trap configuration
register macros.

In other words, we have something like __init_fgt_aa64dfr0 which would
configure the FGT and FGT2 registers based on features in aa64dfr0. I
think you'd need to have a play to see how it ends up looking but the
main thing to avoid is having duplicate ID register parsing code for
setting up FGT and FGT2 traps.

Will

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 07/10] perf: arm_spe: Add support for filtering on data source
  2025-07-14 14:04   ` Will Deacon
@ 2025-07-15 13:04     ` James Clark
  2025-07-17 14:29       ` Will Deacon
  0 siblings, 1 reply; 37+ messages in thread
From: James Clark @ 2025-07-15 13:04 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, Mark Rutland, Jonathan Corbet, Marc Zyngier,
	Oliver Upton, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, leo.yan, linux-arm-kernel, linux-kernel,
	linux-perf-users, linux-doc, kvmarm



On 14/07/2025 3:04 pm, Will Deacon wrote:
> On Thu, Jun 05, 2025 at 11:49:05AM +0100, James Clark wrote:
>> SPE_FEAT_FDS adds the ability to filter on the data source of packets.
>> Like the other existing filters, enable filtering with PMSFCR_EL1.FDS
>> when any of the filter bits are set.
>>
>> Each bit maps to data sources 0-63 described by bits[0:5] in the data
>> source packet (although the full range of data source is 16 bits so
>> higher value data sources can't be filtered on). The filter is an OR of
>> all the bits, so for example setting bits 0 and 3 filters packets from
>> data sources 0 OR 3.
>>
>> Reviewed-by: Leo Yan <leo.yan@arm.com>
>> Tested-by: Leo Yan <leo.yan@arm.com>
>> Signed-off-by: James Clark <james.clark@linaro.org>
>> ---
>>   drivers/perf/arm_spe_pmu.c | 31 +++++++++++++++++++++++++++++++
>>   1 file changed, 31 insertions(+)
>>
>> diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
>> index 9309b846f642..d04318411f77 100644
>> --- a/drivers/perf/arm_spe_pmu.c
>> +++ b/drivers/perf/arm_spe_pmu.c
>> @@ -87,6 +87,7 @@ struct arm_spe_pmu {
>>   #define SPE_PMU_FEAT_INV_FILT_EVT		(1UL << 6)
>>   #define SPE_PMU_FEAT_DISCARD			(1UL << 7)
>>   #define SPE_PMU_FEAT_EFT			(1UL << 8)
>> +#define SPE_PMU_FEAT_FDS			(1UL << 9)
>>   #define SPE_PMU_FEAT_DEV_PROBED			(1UL << 63)
>>   	u64					features;
>>   
>> @@ -232,6 +233,10 @@ static const struct attribute_group arm_spe_pmu_cap_group = {
>>   #define ATTR_CFG_FLD_inv_event_filter_LO	0
>>   #define ATTR_CFG_FLD_inv_event_filter_HI	63
>>   
>> +#define ATTR_CFG_FLD_data_src_filter_CFG	config4	/* PMSDSFR_EL1 */
>> +#define ATTR_CFG_FLD_data_src_filter_LO	0
>> +#define ATTR_CFG_FLD_data_src_filter_HI	63
>> +
>>   GEN_PMU_FORMAT_ATTR(ts_enable);
>>   GEN_PMU_FORMAT_ATTR(pa_enable);
>>   GEN_PMU_FORMAT_ATTR(pct_enable);
>> @@ -248,6 +253,7 @@ GEN_PMU_FORMAT_ATTR(float_filter);
>>   GEN_PMU_FORMAT_ATTR(float_filter_mask);
>>   GEN_PMU_FORMAT_ATTR(event_filter);
>>   GEN_PMU_FORMAT_ATTR(inv_event_filter);
>> +GEN_PMU_FORMAT_ATTR(data_src_filter);
>>   GEN_PMU_FORMAT_ATTR(min_latency);
>>   GEN_PMU_FORMAT_ATTR(discard);
>>   
>> @@ -268,6 +274,7 @@ static struct attribute *arm_spe_pmu_formats_attr[] = {
>>   	&format_attr_float_filter_mask.attr,
>>   	&format_attr_event_filter.attr,
>>   	&format_attr_inv_event_filter.attr,
>> +	&format_attr_data_src_filter.attr,
>>   	&format_attr_min_latency.attr,
>>   	&format_attr_discard.attr,
>>   	NULL,
>> @@ -286,6 +293,9 @@ static umode_t arm_spe_pmu_format_attr_is_visible(struct kobject *kobj,
>>   	if (attr == &format_attr_inv_event_filter.attr && !(spe_pmu->features & SPE_PMU_FEAT_INV_FILT_EVT))
>>   		return 0;
>>   
>> +	if (attr == &format_attr_data_src_filter.attr && !(spe_pmu->features & SPE_PMU_FEAT_FDS))
>> +		return 0;
>> +
>>   	if ((attr == &format_attr_branch_filter_mask.attr ||
>>   	     attr == &format_attr_load_filter_mask.attr ||
>>   	     attr == &format_attr_store_filter_mask.attr ||
>> @@ -406,6 +416,9 @@ static u64 arm_spe_event_to_pmsfcr(struct perf_event *event)
>>   	if (ATTR_CFG_GET_FLD(attr, inv_event_filter))
>>   		reg |= PMSFCR_EL1_FnE;
>>   
>> +	if (ATTR_CFG_GET_FLD(attr, data_src_filter))
>> +		reg |= PMSFCR_EL1_FDS;
> 
> Is the polarity correct here? The description of PMSDSFR_EL1.S<m> suggests
> that setting bits to 1 _excludes_ the FDS filtering.
> 

Setting filter bits to 1 means that samples matching are included. 
Setting bits to 0 means that they are excluded. And PMSFCR_EL1.FDS 
enables filtering as a whole, so if the user sets any filter bit to 1 we 
want to enable filtering:

   PMSDSFR_EL1.S<m>

   0b0  If PMSFCR_EL1.FDS is 1, do not record load operations that have
        bits [5:0] of the Data Source packet set to <m>.

   0b1  Load operations with Data Source <m> are unaffected by
        PMSFCR_EL1.FDS.

I think it's all the right way around and it ends up being the same as 
the other filters in SPE. Because we're using any bit being set to 
enable the filtering, the only thing you can't do is enable filtering 
with a 0 filter, but I didn't think that was useful. See the previous 
discussion on this here: 
https://lore.kernel.org/all/5752f039-51c1-4452-b5df-03ff06da7be3@linaro.org/

Reading the "Data source filtering" section in the docs change at the 
end might help too.

>>   	if (ATTR_CFG_GET_FLD(attr, min_latency))
>>   		reg |= PMSFCR_EL1_FL;
>>   
>> @@ -430,6 +443,12 @@ static u64 arm_spe_event_to_pmslatfr(struct perf_event *event)
>>   	return FIELD_PREP(PMSLATFR_EL1_MINLAT, ATTR_CFG_GET_FLD(attr, min_latency));
>>   }
>>   
>> +static u64 arm_spe_event_to_pmsdsfr(struct perf_event *event)
>> +{
>> +	struct perf_event_attr *attr = &event->attr;
>> +	return ATTR_CFG_GET_FLD(attr, data_src_filter);
>> +}
>> +
>>   static void arm_spe_pmu_pad_buf(struct perf_output_handle *handle, int len)
>>   {
>>   	struct arm_spe_pmu_buf *buf = perf_get_aux(handle);
>> @@ -788,6 +807,10 @@ static int arm_spe_pmu_event_init(struct perf_event *event)
>>   	if (arm_spe_event_to_pmsnevfr(event) & arm_spe_pmsevfr_res0(spe_pmu->pmsver))
>>   		return -EOPNOTSUPP;
>>   
>> +	if (arm_spe_event_to_pmsdsfr(event) &&
>> +	    !(spe_pmu->features & SPE_PMU_FEAT_FDS))
>> +		return -EOPNOTSUPP;
> 
> Same question here.
> 
> Will




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 04/10] arm64/boot: Enable EL2 requirements for SPE_FEAT_FDS
  2025-07-15 12:57       ` Will Deacon
@ 2025-07-15 13:10         ` James Clark
  2025-07-15 13:28           ` James Clark
  0 siblings, 1 reply; 37+ messages in thread
From: James Clark @ 2025-07-15 13:10 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, Mark Rutland, Jonathan Corbet, Marc Zyngier,
	Oliver Upton, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, leo.yan, linux-arm-kernel, linux-kernel,
	linux-perf-users, linux-doc, kvmarm



On 15/07/2025 1:57 pm, Will Deacon wrote:
> On Tue, Jul 15, 2025 at 01:48:03PM +0100, James Clark wrote:
>>
>>
>> On 14/07/2025 2:54 pm, Will Deacon wrote:
>>> On Thu, Jun 05, 2025 at 11:49:02AM +0100, James Clark wrote:
>>>> SPE data source filtering (optional from Armv8.8) requires that traps to
>>>> the filter register PMSDSFR be disabled. Document the requirements and
>>>> disable the traps if the feature is present.
>>>>
>>>> Tested-by: Leo Yan <leo.yan@arm.com>
>>>> Signed-off-by: James Clark <james.clark@linaro.org>
>>>> ---
>>>>    Documentation/arch/arm64/booting.rst | 11 +++++++++++
>>>>    arch/arm64/include/asm/el2_setup.h   | 14 ++++++++++++++
>>>>    2 files changed, 25 insertions(+)
>>>>
>>>> diff --git a/Documentation/arch/arm64/booting.rst b/Documentation/arch/arm64/booting.rst
>>>> index dee7b6de864f..abd75085a239 100644
>>>> --- a/Documentation/arch/arm64/booting.rst
>>>> +++ b/Documentation/arch/arm64/booting.rst
>>>> @@ -404,6 +404,17 @@ Before jumping into the kernel, the following conditions must be met:
>>>>        - HDFGWTR2_EL2.nPMICFILTR_EL0 (bit 3) must be initialised to 0b1.
>>>>        - HDFGWTR2_EL2.nPMUACR_EL1 (bit 4) must be initialised to 0b1.
>>>> +  For CPUs with SPE data source filtering (FEAT_SPE_FDS):
>>>> +
>>>> +  - If EL3 is present:
>>>> +
>>>> +    - MDCR_EL3.EnPMS3 (bit 42) must be initialised to 0b1.
>>>> +
>>>> +  - If the kernel is entered at EL1 and EL2 is present:
>>>> +
>>>> +    - HDFGRTR2_EL2.nPMSDSFR_EL1 (bit 19) must be initialised to 0b1.
>>>> +    - HDFGWTR2_EL2.nPMSDSFR_EL1 (bit 19) must be initialised to 0b1.
>>>> +
>>>>      For CPUs with Memory Copy and Memory Set instructions (FEAT_MOPS):
>>>>      - If the kernel is entered at EL1 and EL2 is present:
>>>> diff --git a/arch/arm64/include/asm/el2_setup.h b/arch/arm64/include/asm/el2_setup.h
>>>> index 1e7c7475e43f..02b4a7fc016e 100644
>>>> --- a/arch/arm64/include/asm/el2_setup.h
>>>> +++ b/arch/arm64/include/asm/el2_setup.h
>>>> @@ -279,6 +279,20 @@
>>>>    	orr	x0, x0, #HDFGRTR2_EL2_nPMICFILTR_EL0
>>>>    	orr	x0, x0, #HDFGRTR2_EL2_nPMUACR_EL1
>>>>    .Lskip_pmuv3p9_\@:
>>>> +	mrs	x1, id_aa64dfr0_el1
>>>> +	ubfx	x1, x1, #ID_AA64DFR0_EL1_PMSVer_SHIFT, #4
>>>> +	/* If SPE is implemented, */
>>>> +	cmp	x1, #ID_AA64DFR0_EL1_PMSVer_IMP
>>>> +	b.lt	.Lskip_spefds_\@
>>>> +	/* we can read PMSIDR and */
>>>> +	mrs_s	x1, SYS_PMSIDR_EL1
>>>> +	and	x1, x1,  #PMSIDR_EL1_FDS
>>>> +	/* if FEAT_SPE_FDS is implemented, */
>>>> +	cbz	x1, .Lskip_spefds_\@
>>>> +	/* disable traps to PMSDSFR. */
>>>> +	orr	x0, x0, #HDFGRTR2_EL2_nPMSDSFR_EL1
>>>
>>> Why is this being done here rather than alongside the existing SPE
>>> configuration of HDFGRTR_EL2 and HDFGWTR_EL2 near the start of
>>> __init_el2_fgt?
>>>
>> I thought everything was separated by which trap configs it writes to,
>> rather than the feature. This SPE feature is in HDFGRTR2 so I put it in
>> __init_el2_fgt2 rather than __init_el2_fgt.
> 
> That's fair; __init_el2_fgt isn't the right place. But the redundancy of
> re-reading PMSVer from DFR0 is a little jarring.
> 
>> I suppose we could have a single __init_el2_spe that writes to both HDFGRTR
>> and HDFGRTR2 but we'd have to be careful to not overwrite what was already
>> done in the other sections.
> 
> Right, perhaps it would be clearer to have trap-preserving macros for
> features in a specific ID register rather than per-trap configuration
> register macros.
> 
> In other words, we have something like __init_fgt_aa64dfr0 which would
> configure the FGT and FGT2 registers based on features in aa64dfr0. I
> think you'd need to have a play to see how it ends up looking but the
> main thing to avoid is having duplicate ID register parsing code for
> setting up FGT and FGT2 traps.
> 
> Will

I'll give it a go but that could end up being fragile to something that 
is dependent on two different ID registers in the future. Then we'd end 
up in the same situation for a different reason.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 04/10] arm64/boot: Enable EL2 requirements for SPE_FEAT_FDS
  2025-07-15 13:10         ` James Clark
@ 2025-07-15 13:28           ` James Clark
  2025-07-17 11:52             ` Will Deacon
  0 siblings, 1 reply; 37+ messages in thread
From: James Clark @ 2025-07-15 13:28 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, Mark Rutland, Jonathan Corbet, Marc Zyngier,
	Oliver Upton, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, leo.yan, linux-arm-kernel, linux-kernel,
	linux-perf-users, linux-doc, kvmarm



On 15/07/2025 2:10 pm, James Clark wrote:
> 
> 
> On 15/07/2025 1:57 pm, Will Deacon wrote:
>> On Tue, Jul 15, 2025 at 01:48:03PM +0100, James Clark wrote:
>>>
>>>
>>> On 14/07/2025 2:54 pm, Will Deacon wrote:
>>>> On Thu, Jun 05, 2025 at 11:49:02AM +0100, James Clark wrote:
>>>>> SPE data source filtering (optional from Armv8.8) requires that 
>>>>> traps to
>>>>> the filter register PMSDSFR be disabled. Document the requirements and
>>>>> disable the traps if the feature is present.
>>>>>
>>>>> Tested-by: Leo Yan <leo.yan@arm.com>
>>>>> Signed-off-by: James Clark <james.clark@linaro.org>
>>>>> ---
>>>>>    Documentation/arch/arm64/booting.rst | 11 +++++++++++
>>>>>    arch/arm64/include/asm/el2_setup.h   | 14 ++++++++++++++
>>>>>    2 files changed, 25 insertions(+)
>>>>>
>>>>> diff --git a/Documentation/arch/arm64/booting.rst b/Documentation/ 
>>>>> arch/arm64/booting.rst
>>>>> index dee7b6de864f..abd75085a239 100644
>>>>> --- a/Documentation/arch/arm64/booting.rst
>>>>> +++ b/Documentation/arch/arm64/booting.rst
>>>>> @@ -404,6 +404,17 @@ Before jumping into the kernel, the following 
>>>>> conditions must be met:
>>>>>        - HDFGWTR2_EL2.nPMICFILTR_EL0 (bit 3) must be initialised to 
>>>>> 0b1.
>>>>>        - HDFGWTR2_EL2.nPMUACR_EL1 (bit 4) must be initialised to 0b1.
>>>>> +  For CPUs with SPE data source filtering (FEAT_SPE_FDS):
>>>>> +
>>>>> +  - If EL3 is present:
>>>>> +
>>>>> +    - MDCR_EL3.EnPMS3 (bit 42) must be initialised to 0b1.
>>>>> +
>>>>> +  - If the kernel is entered at EL1 and EL2 is present:
>>>>> +
>>>>> +    - HDFGRTR2_EL2.nPMSDSFR_EL1 (bit 19) must be initialised to 0b1.
>>>>> +    - HDFGWTR2_EL2.nPMSDSFR_EL1 (bit 19) must be initialised to 0b1.
>>>>> +
>>>>>      For CPUs with Memory Copy and Memory Set instructions 
>>>>> (FEAT_MOPS):
>>>>>      - If the kernel is entered at EL1 and EL2 is present:
>>>>> diff --git a/arch/arm64/include/asm/el2_setup.h b/arch/arm64/ 
>>>>> include/asm/el2_setup.h
>>>>> index 1e7c7475e43f..02b4a7fc016e 100644
>>>>> --- a/arch/arm64/include/asm/el2_setup.h
>>>>> +++ b/arch/arm64/include/asm/el2_setup.h
>>>>> @@ -279,6 +279,20 @@
>>>>>        orr    x0, x0, #HDFGRTR2_EL2_nPMICFILTR_EL0
>>>>>        orr    x0, x0, #HDFGRTR2_EL2_nPMUACR_EL1
>>>>>    .Lskip_pmuv3p9_\@:
>>>>> +    mrs    x1, id_aa64dfr0_el1
>>>>> +    ubfx    x1, x1, #ID_AA64DFR0_EL1_PMSVer_SHIFT, #4
>>>>> +    /* If SPE is implemented, */
>>>>> +    cmp    x1, #ID_AA64DFR0_EL1_PMSVer_IMP
>>>>> +    b.lt    .Lskip_spefds_\@
>>>>> +    /* we can read PMSIDR and */
>>>>> +    mrs_s    x1, SYS_PMSIDR_EL1
>>>>> +    and    x1, x1,  #PMSIDR_EL1_FDS
>>>>> +    /* if FEAT_SPE_FDS is implemented, */
>>>>> +    cbz    x1, .Lskip_spefds_\@
>>>>> +    /* disable traps to PMSDSFR. */
>>>>> +    orr    x0, x0, #HDFGRTR2_EL2_nPMSDSFR_EL1
>>>>
>>>> Why is this being done here rather than alongside the existing SPE
>>>> configuration of HDFGRTR_EL2 and HDFGWTR_EL2 near the start of
>>>> __init_el2_fgt?
>>>>
>>> I thought everything was separated by which trap configs it writes to,
>>> rather than the feature. This SPE feature is in HDFGRTR2 so I put it in
>>> __init_el2_fgt2 rather than __init_el2_fgt.
>>
>> That's fair; __init_el2_fgt isn't the right place. But the redundancy of
>> re-reading PMSVer from DFR0 is a little jarring.
>>
>>> I suppose we could have a single __init_el2_spe that writes to both 
>>> HDFGRTR
>>> and HDFGRTR2 but we'd have to be careful to not overwrite what was 
>>> already
>>> done in the other sections.
>>
>> Right, perhaps it would be clearer to have trap-preserving macros for
>> features in a specific ID register rather than per-trap configuration
>> register macros.
>>
>> In other words, we have something like __init_fgt_aa64dfr0 which would
>> configure the FGT and FGT2 registers based on features in aa64dfr0. I
>> think you'd need to have a play to see how it ends up looking but the
>> main thing to avoid is having duplicate ID register parsing code for
>> setting up FGT and FGT2 traps.
>>
>> Will
> 
> I'll give it a go but that could end up being fragile to something that 
> is dependent on two different ID registers in the future. Then we'd end 
> up in the same situation for a different reason.
> 

I think I've run into it already. Wouldn't checking for FGT and FGT2 
have to be repeated when doing each ID register? Now we only do that 
once at the start of __init_el2_fgt and __init_el2_fgt2, even if we 
might sometimes check a different ID register twice. But if we flipped 
it we'd always have to repeat those.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 04/10] arm64/boot: Enable EL2 requirements for SPE_FEAT_FDS
  2025-07-15 13:28           ` James Clark
@ 2025-07-17 11:52             ` Will Deacon
  0 siblings, 0 replies; 37+ messages in thread
From: Will Deacon @ 2025-07-17 11:52 UTC (permalink / raw)
  To: James Clark
  Cc: Catalin Marinas, Mark Rutland, Jonathan Corbet, Marc Zyngier,
	Oliver Upton, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, leo.yan, linux-arm-kernel, linux-kernel,
	linux-perf-users, linux-doc, kvmarm

On Tue, Jul 15, 2025 at 02:28:19PM +0100, James Clark wrote:
> 
> 
> On 15/07/2025 2:10 pm, James Clark wrote:
> > 
> > 
> > On 15/07/2025 1:57 pm, Will Deacon wrote:
> > > On Tue, Jul 15, 2025 at 01:48:03PM +0100, James Clark wrote:
> > > > 
> > > > 
> > > > On 14/07/2025 2:54 pm, Will Deacon wrote:
> > > > > On Thu, Jun 05, 2025 at 11:49:02AM +0100, James Clark wrote:
> > > > > > SPE data source filtering (optional from Armv8.8)
> > > > > > requires that traps to
> > > > > > the filter register PMSDSFR be disabled. Document the requirements and
> > > > > > disable the traps if the feature is present.
> > > > > > 
> > > > > > Tested-by: Leo Yan <leo.yan@arm.com>
> > > > > > Signed-off-by: James Clark <james.clark@linaro.org>
> > > > > > ---
> > > > > >    Documentation/arch/arm64/booting.rst | 11 +++++++++++
> > > > > >    arch/arm64/include/asm/el2_setup.h   | 14 ++++++++++++++
> > > > > >    2 files changed, 25 insertions(+)
> > > > > > 
> > > > > > diff --git a/Documentation/arch/arm64/booting.rst
> > > > > > b/Documentation/ arch/arm64/booting.rst
> > > > > > index dee7b6de864f..abd75085a239 100644
> > > > > > --- a/Documentation/arch/arm64/booting.rst
> > > > > > +++ b/Documentation/arch/arm64/booting.rst
> > > > > > @@ -404,6 +404,17 @@ Before jumping into the kernel, the
> > > > > > following conditions must be met:
> > > > > >        - HDFGWTR2_EL2.nPMICFILTR_EL0 (bit 3) must be
> > > > > > initialised to 0b1.
> > > > > >        - HDFGWTR2_EL2.nPMUACR_EL1 (bit 4) must be initialised to 0b1.
> > > > > > +  For CPUs with SPE data source filtering (FEAT_SPE_FDS):
> > > > > > +
> > > > > > +  - If EL3 is present:
> > > > > > +
> > > > > > +    - MDCR_EL3.EnPMS3 (bit 42) must be initialised to 0b1.
> > > > > > +
> > > > > > +  - If the kernel is entered at EL1 and EL2 is present:
> > > > > > +
> > > > > > +    - HDFGRTR2_EL2.nPMSDSFR_EL1 (bit 19) must be initialised to 0b1.
> > > > > > +    - HDFGWTR2_EL2.nPMSDSFR_EL1 (bit 19) must be initialised to 0b1.
> > > > > > +
> > > > > >      For CPUs with Memory Copy and Memory Set
> > > > > > instructions (FEAT_MOPS):
> > > > > >      - If the kernel is entered at EL1 and EL2 is present:
> > > > > > diff --git a/arch/arm64/include/asm/el2_setup.h
> > > > > > b/arch/arm64/ include/asm/el2_setup.h
> > > > > > index 1e7c7475e43f..02b4a7fc016e 100644
> > > > > > --- a/arch/arm64/include/asm/el2_setup.h
> > > > > > +++ b/arch/arm64/include/asm/el2_setup.h
> > > > > > @@ -279,6 +279,20 @@
> > > > > >        orr    x0, x0, #HDFGRTR2_EL2_nPMICFILTR_EL0
> > > > > >        orr    x0, x0, #HDFGRTR2_EL2_nPMUACR_EL1
> > > > > >    .Lskip_pmuv3p9_\@:
> > > > > > +    mrs    x1, id_aa64dfr0_el1
> > > > > > +    ubfx    x1, x1, #ID_AA64DFR0_EL1_PMSVer_SHIFT, #4
> > > > > > +    /* If SPE is implemented, */
> > > > > > +    cmp    x1, #ID_AA64DFR0_EL1_PMSVer_IMP
> > > > > > +    b.lt    .Lskip_spefds_\@
> > > > > > +    /* we can read PMSIDR and */
> > > > > > +    mrs_s    x1, SYS_PMSIDR_EL1
> > > > > > +    and    x1, x1,  #PMSIDR_EL1_FDS
> > > > > > +    /* if FEAT_SPE_FDS is implemented, */
> > > > > > +    cbz    x1, .Lskip_spefds_\@
> > > > > > +    /* disable traps to PMSDSFR. */
> > > > > > +    orr    x0, x0, #HDFGRTR2_EL2_nPMSDSFR_EL1
> > > > > 
> > > > > Why is this being done here rather than alongside the existing SPE
> > > > > configuration of HDFGRTR_EL2 and HDFGWTR_EL2 near the start of
> > > > > __init_el2_fgt?
> > > > > 
> > > > I thought everything was separated by which trap configs it writes to,
> > > > rather than the feature. This SPE feature is in HDFGRTR2 so I put it in
> > > > __init_el2_fgt2 rather than __init_el2_fgt.
> > > 
> > > That's fair; __init_el2_fgt isn't the right place. But the redundancy of
> > > re-reading PMSVer from DFR0 is a little jarring.
> > > 
> > > > I suppose we could have a single __init_el2_spe that writes to
> > > > both HDFGRTR
> > > > and HDFGRTR2 but we'd have to be careful to not overwrite what
> > > > was already
> > > > done in the other sections.
> > > 
> > > Right, perhaps it would be clearer to have trap-preserving macros for
> > > features in a specific ID register rather than per-trap configuration
> > > register macros.
> > > 
> > > In other words, we have something like __init_fgt_aa64dfr0 which would
> > > configure the FGT and FGT2 registers based on features in aa64dfr0. I
> > > think you'd need to have a play to see how it ends up looking but the
> > > main thing to avoid is having duplicate ID register parsing code for
> > > setting up FGT and FGT2 traps.
> > > 
> > 
> > I'll give it a go but that could end up being fragile to something that
> > is dependent on two different ID registers in the future. Then we'd end
> > up in the same situation for a different reason.
> > 
> 
> I think I've run into it already. Wouldn't checking for FGT and FGT2 have to
> be repeated when doing each ID register? Now we only do that once at the
> start of __init_el2_fgt and __init_el2_fgt2, even if we might sometimes
> check a different ID register twice. But if we flipped it we'd always have
> to repeat those.

Bah, this is quite horrible! Maybe the best we can do for now is have a
macro for safely getting at PMBIDR?

At some point, I suspect this whole FGT-configuration logic will need
reworking but at the moment it's hard to see what the best approach
would be.

Will

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 07/10] perf: arm_spe: Add support for filtering on data source
  2025-07-15 13:04     ` James Clark
@ 2025-07-17 14:29       ` Will Deacon
  2025-07-17 15:16         ` James Clark
  0 siblings, 1 reply; 37+ messages in thread
From: Will Deacon @ 2025-07-17 14:29 UTC (permalink / raw)
  To: James Clark
  Cc: Catalin Marinas, Mark Rutland, Jonathan Corbet, Marc Zyngier,
	Oliver Upton, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, leo.yan, linux-arm-kernel, linux-kernel,
	linux-perf-users, linux-doc, kvmarm

On Tue, Jul 15, 2025 at 02:04:18PM +0100, James Clark wrote:
> On 14/07/2025 3:04 pm, Will Deacon wrote:
> > On Thu, Jun 05, 2025 at 11:49:05AM +0100, James Clark wrote:
> > > @@ -406,6 +416,9 @@ static u64 arm_spe_event_to_pmsfcr(struct perf_event *event)
> > >   	if (ATTR_CFG_GET_FLD(attr, inv_event_filter))
> > >   		reg |= PMSFCR_EL1_FnE;
> > > +	if (ATTR_CFG_GET_FLD(attr, data_src_filter))
> > > +		reg |= PMSFCR_EL1_FDS;
> > 
> > Is the polarity correct here? The description of PMSDSFR_EL1.S<m> suggests
> > that setting bits to 1 _excludes_ the FDS filtering.
> > 
> 
> Setting filter bits to 1 means that samples matching are included. Setting
> bits to 0 means that they are excluded. And PMSFCR_EL1.FDS enables filtering
> as a whole, so if the user sets any filter bit to 1 we want to enable
> filtering:
> 
>   PMSDSFR_EL1.S<m>
> 
>   0b0  If PMSFCR_EL1.FDS is 1, do not record load operations that have
>        bits [5:0] of the Data Source packet set to <m>.
> 
>   0b1  Load operations with Data Source <m> are unaffected by
>        PMSFCR_EL1.FDS.
> 
> I think it's all the right way around and it ends up being the same as the
> other filters in SPE. Because we're using any bit being set to enable the
> filtering, the only thing you can't do is enable filtering with a 0 filter,
> but I didn't think that was useful. See the previous discussion on this
> here:
> https://lore.kernel.org/all/5752f039-51c1-4452-b5df-03ff06da7be3@linaro.org/
> 
> Reading the "Data source filtering" section in the docs change at the end
> might help too.

Sorry, but I still don't get it :/

afaict, if any of the bits in 'data_src_filter' are _zero_ then we
should set PMSFCR_EL1.FDS. That also means that a mask of zero means all
loads are filtered, which is what the architecture says and is what we
should provide to userspace.

Will

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 07/10] perf: arm_spe: Add support for filtering on data source
  2025-07-17 14:29       ` Will Deacon
@ 2025-07-17 15:16         ` James Clark
  2025-07-17 15:27           ` Will Deacon
  0 siblings, 1 reply; 37+ messages in thread
From: James Clark @ 2025-07-17 15:16 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, Mark Rutland, Jonathan Corbet, Marc Zyngier,
	Oliver Upton, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, leo.yan, linux-arm-kernel, linux-kernel,
	linux-perf-users, linux-doc, kvmarm



On 17/07/2025 3:29 pm, Will Deacon wrote:
> On Tue, Jul 15, 2025 at 02:04:18PM +0100, James Clark wrote:
>> On 14/07/2025 3:04 pm, Will Deacon wrote:
>>> On Thu, Jun 05, 2025 at 11:49:05AM +0100, James Clark wrote:
>>>> @@ -406,6 +416,9 @@ static u64 arm_spe_event_to_pmsfcr(struct perf_event *event)
>>>>    	if (ATTR_CFG_GET_FLD(attr, inv_event_filter))
>>>>    		reg |= PMSFCR_EL1_FnE;
>>>> +	if (ATTR_CFG_GET_FLD(attr, data_src_filter))
>>>> +		reg |= PMSFCR_EL1_FDS;
>>>
>>> Is the polarity correct here? The description of PMSDSFR_EL1.S<m> suggests
>>> that setting bits to 1 _excludes_ the FDS filtering.
>>>
>>
>> Setting filter bits to 1 means that samples matching are included. Setting
>> bits to 0 means that they are excluded. And PMSFCR_EL1.FDS enables filtering
>> as a whole, so if the user sets any filter bit to 1 we want to enable
>> filtering:
>>
>>    PMSDSFR_EL1.S<m>
>>
>>    0b0  If PMSFCR_EL1.FDS is 1, do not record load operations that have
>>         bits [5:0] of the Data Source packet set to <m>.
>>
>>    0b1  Load operations with Data Source <m> are unaffected by
>>         PMSFCR_EL1.FDS.
>>
>> I think it's all the right way around and it ends up being the same as the
>> other filters in SPE. Because we're using any bit being set to enable the
>> filtering, the only thing you can't do is enable filtering with a 0 filter,
>> but I didn't think that was useful. See the previous discussion on this
>> here:
>> https://lore.kernel.org/all/5752f039-51c1-4452-b5df-03ff06da7be3@linaro.org/
>>
>> Reading the "Data source filtering" section in the docs change at the end
>> might help too.
> 
> Sorry, but I still don't get it :/
> 
> afaict, if any of the bits in 'data_src_filter' are _zero_ then we
> should set PMSFCR_EL1.FDS. That also means that a mask of zero means all
> loads are filtered, which is what the architecture says and is what we
> should provide to userspace.
> 
> Will

We'd have to add another format flag to enable data source filtering 
then, because otherwise the default would be zero and people's samples 
would disappear.

But the only use cases I could think of were more like "I want to see 
samples from data source 1":

   -e arm_spe/data_src_filter=0x1/

Or "I want to see all data sources except 1":

   -e arm_spe/data_src_filter=0xfffffffe/

Filtering out all samples with any data source didn't seem to make sense 
to me, and I think you can already do that with the other filters 
(remove loads etc).

It would be a shame to be inconsistent and to add an enable flag just 
for that one case because the other filters in SPE are auto enabled for 
non-zero values. Although to be fair for PMSFCR.FT and others, zero 
filters are explicitly not allowed:

   If this field is set to 1 and the PMSFCR_EL1.{ST, LD, B} bits are all
   set to zero, it is CONSTRAINED UNPREDICTABLE whether no samples are
   recorded or the PE behaves as if PMSFCR_EL1.FT is set to 0

Seems like FDS doesn't end up as neat as the others, but IMO I can't see 
anyone needing a zero filter. I did discuss it with Leo and we decided 
that we could always add the enable flag at a later date if a use case 
turned up and it wouldn't be a breaking change.

But if you think it's there so it should be exposed I can add it.

James


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 07/10] perf: arm_spe: Add support for filtering on data source
  2025-07-17 15:16         ` James Clark
@ 2025-07-17 15:27           ` Will Deacon
  2025-07-17 16:42             ` James Clark
  0 siblings, 1 reply; 37+ messages in thread
From: Will Deacon @ 2025-07-17 15:27 UTC (permalink / raw)
  To: James Clark
  Cc: Catalin Marinas, Mark Rutland, Jonathan Corbet, Marc Zyngier,
	Oliver Upton, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, leo.yan, linux-arm-kernel, linux-kernel,
	linux-perf-users, linux-doc, kvmarm

On Thu, Jul 17, 2025 at 04:16:32PM +0100, James Clark wrote:
> 
> 
> On 17/07/2025 3:29 pm, Will Deacon wrote:
> > On Tue, Jul 15, 2025 at 02:04:18PM +0100, James Clark wrote:
> > > On 14/07/2025 3:04 pm, Will Deacon wrote:
> > > > On Thu, Jun 05, 2025 at 11:49:05AM +0100, James Clark wrote:
> > > > > @@ -406,6 +416,9 @@ static u64 arm_spe_event_to_pmsfcr(struct perf_event *event)
> > > > >    	if (ATTR_CFG_GET_FLD(attr, inv_event_filter))
> > > > >    		reg |= PMSFCR_EL1_FnE;
> > > > > +	if (ATTR_CFG_GET_FLD(attr, data_src_filter))
> > > > > +		reg |= PMSFCR_EL1_FDS;
> > > > 
> > > > Is the polarity correct here? The description of PMSDSFR_EL1.S<m> suggests
> > > > that setting bits to 1 _excludes_ the FDS filtering.
> > > > 
> > > 
> > > Setting filter bits to 1 means that samples matching are included. Setting
> > > bits to 0 means that they are excluded. And PMSFCR_EL1.FDS enables filtering
> > > as a whole, so if the user sets any filter bit to 1 we want to enable
> > > filtering:
> > > 
> > >    PMSDSFR_EL1.S<m>
> > > 
> > >    0b0  If PMSFCR_EL1.FDS is 1, do not record load operations that have
> > >         bits [5:0] of the Data Source packet set to <m>.
> > > 
> > >    0b1  Load operations with Data Source <m> are unaffected by
> > >         PMSFCR_EL1.FDS.
> > > 
> > > I think it's all the right way around and it ends up being the same as the
> > > other filters in SPE. Because we're using any bit being set to enable the
> > > filtering, the only thing you can't do is enable filtering with a 0 filter,
> > > but I didn't think that was useful. See the previous discussion on this
> > > here:
> > > https://lore.kernel.org/all/5752f039-51c1-4452-b5df-03ff06da7be3@linaro.org/
> > > 
> > > Reading the "Data source filtering" section in the docs change at the end
> > > might help too.
> > 
> > Sorry, but I still don't get it :/
> > 
> > afaict, if any of the bits in 'data_src_filter' are _zero_ then we
> > should set PMSFCR_EL1.FDS. That also means that a mask of zero means all
> > loads are filtered, which is what the architecture says and is what we
> > should provide to userspace.
> > 
> > Will
> 
> We'd have to add another format flag to enable data source filtering then,
> because otherwise the default would be zero and people's samples would
> disappear.
> 
> But the only use cases I could think of were more like "I want to see
> samples from data source 1":
> 
>   -e arm_spe/data_src_filter=0x1/
> 
> Or "I want to see all data sources except 1":
> 
>   -e arm_spe/data_src_filter=0xfffffffe/
> 
> Filtering out all samples with any data source didn't seem to make sense to
> me, and I think you can already do that with the other filters (remove loads
> etc).
> 
> It would be a shame to be inconsistent and to add an enable flag just for
> that one case because the other filters in SPE are auto enabled for non-zero
> values. Although to be fair for PMSFCR.FT and others, zero filters are
> explicitly not allowed:
> 
>   If this field is set to 1 and the PMSFCR_EL1.{ST, LD, B} bits are all
>   set to zero, it is CONSTRAINED UNPREDICTABLE whether no samples are
>   recorded or the PE behaves as if PMSFCR_EL1.FT is set to 0
> 
> Seems like FDS doesn't end up as neat as the others, but IMO I can't see
> anyone needing a zero filter. I did discuss it with Leo and we decided that
> we could always add the enable flag at a later date if a use case turned up
> and it wouldn't be a breaking change.
> 
> But if you think it's there so it should be exposed I can add it.

What about if we expose the inverse of PMSDSFR_EL1 to userspace instead?

Will

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 07/10] perf: arm_spe: Add support for filtering on data source
  2025-07-17 15:27           ` Will Deacon
@ 2025-07-17 16:42             ` James Clark
  0 siblings, 0 replies; 37+ messages in thread
From: James Clark @ 2025-07-17 16:42 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, Mark Rutland, Jonathan Corbet, Marc Zyngier,
	Oliver Upton, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, leo.yan, linux-arm-kernel, linux-kernel,
	linux-perf-users, linux-doc, kvmarm



On 17/07/2025 4:27 pm, Will Deacon wrote:
> On Thu, Jul 17, 2025 at 04:16:32PM +0100, James Clark wrote:
>>
>>
>> On 17/07/2025 3:29 pm, Will Deacon wrote:
>>> On Tue, Jul 15, 2025 at 02:04:18PM +0100, James Clark wrote:
>>>> On 14/07/2025 3:04 pm, Will Deacon wrote:
>>>>> On Thu, Jun 05, 2025 at 11:49:05AM +0100, James Clark wrote:
>>>>>> @@ -406,6 +416,9 @@ static u64 arm_spe_event_to_pmsfcr(struct perf_event *event)
>>>>>>     	if (ATTR_CFG_GET_FLD(attr, inv_event_filter))
>>>>>>     		reg |= PMSFCR_EL1_FnE;
>>>>>> +	if (ATTR_CFG_GET_FLD(attr, data_src_filter))
>>>>>> +		reg |= PMSFCR_EL1_FDS;
>>>>>
>>>>> Is the polarity correct here? The description of PMSDSFR_EL1.S<m> suggests
>>>>> that setting bits to 1 _excludes_ the FDS filtering.
>>>>>
>>>>
>>>> Setting filter bits to 1 means that samples matching are included. Setting
>>>> bits to 0 means that they are excluded. And PMSFCR_EL1.FDS enables filtering
>>>> as a whole, so if the user sets any filter bit to 1 we want to enable
>>>> filtering:
>>>>
>>>>     PMSDSFR_EL1.S<m>
>>>>
>>>>     0b0  If PMSFCR_EL1.FDS is 1, do not record load operations that have
>>>>          bits [5:0] of the Data Source packet set to <m>.
>>>>
>>>>     0b1  Load operations with Data Source <m> are unaffected by
>>>>          PMSFCR_EL1.FDS.
>>>>
>>>> I think it's all the right way around and it ends up being the same as the
>>>> other filters in SPE. Because we're using any bit being set to enable the
>>>> filtering, the only thing you can't do is enable filtering with a 0 filter,
>>>> but I didn't think that was useful. See the previous discussion on this
>>>> here:
>>>> https://lore.kernel.org/all/5752f039-51c1-4452-b5df-03ff06da7be3@linaro.org/
>>>>
>>>> Reading the "Data source filtering" section in the docs change at the end
>>>> might help too.
>>>
>>> Sorry, but I still don't get it :/
>>>
>>> afaict, if any of the bits in 'data_src_filter' are _zero_ then we
>>> should set PMSFCR_EL1.FDS. That also means that a mask of zero means all
>>> loads are filtered, which is what the architecture says and is what we
>>> should provide to userspace.
>>>
>>> Will
>>
>> We'd have to add another format flag to enable data source filtering then,
>> because otherwise the default would be zero and people's samples would
>> disappear.
>>
>> But the only use cases I could think of were more like "I want to see
>> samples from data source 1":
>>
>>    -e arm_spe/data_src_filter=0x1/
>>
>> Or "I want to see all data sources except 1":
>>
>>    -e arm_spe/data_src_filter=0xfffffffe/
>>
>> Filtering out all samples with any data source didn't seem to make sense to
>> me, and I think you can already do that with the other filters (remove loads
>> etc).
>>
>> It would be a shame to be inconsistent and to add an enable flag just for
>> that one case because the other filters in SPE are auto enabled for non-zero
>> values. Although to be fair for PMSFCR.FT and others, zero filters are
>> explicitly not allowed:
>>
>>    If this field is set to 1 and the PMSFCR_EL1.{ST, LD, B} bits are all
>>    set to zero, it is CONSTRAINED UNPREDICTABLE whether no samples are
>>    recorded or the PE behaves as if PMSFCR_EL1.FT is set to 0
>>
>> Seems like FDS doesn't end up as neat as the others, but IMO I can't see
>> anyone needing a zero filter. I did discuss it with Leo and we decided that
>> we could always add the enable flag at a later date if a use case turned up
>> and it wouldn't be a breaking change.
>>
>> But if you think it's there so it should be exposed I can add it.
> 
> What about if we expose the inverse of PMSDSFR_EL1 to userspace instead?
> 
> Will

That's exactly what I was thinking. It does seem a bit weird, but I can 
call it inv_data_src_filter and document it appropriately so it should 
be fine.


^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2025-07-17 16:42 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-05 10:48 [PATCH v3 00/10] perf: arm_spe: Armv8.8 SPE features James Clark
2025-06-05 10:48 ` [PATCH v3 01/10] arm64: sysreg: Add new PMSFCR_EL1 fields and PMSDSFR_EL1 register James Clark
2025-06-12  6:44   ` Anshuman Khandual
2025-07-14 13:32   ` Will Deacon
2025-06-05 10:49 ` [PATCH v3 02/10] perf: arm_spe: Support FEAT_SPEv1p4 filters James Clark
2025-06-12  7:35   ` Anshuman Khandual
2025-06-12  8:42     ` James Clark
2025-07-14 13:26   ` Will Deacon
2025-07-15 11:23     ` James Clark
2025-06-05 10:49 ` [PATCH v3 03/10] perf: arm_spe: Add support for FEAT_SPE_EFT extended filtering James Clark
2025-07-14 13:46   ` Will Deacon
2025-07-15 12:39     ` James Clark
2025-06-05 10:49 ` [PATCH v3 04/10] arm64/boot: Enable EL2 requirements for SPE_FEAT_FDS James Clark
2025-07-14 13:54   ` Will Deacon
2025-07-15 12:48     ` James Clark
2025-07-15 12:57       ` Will Deacon
2025-07-15 13:10         ` James Clark
2025-07-15 13:28           ` James Clark
2025-07-17 11:52             ` Will Deacon
2025-06-05 10:49 ` [PATCH v3 05/10] KVM: arm64: Add trap configs for PMSDSFR_EL1 James Clark
2025-07-09  9:53   ` Joey Gouly
2025-06-05 10:49 ` [PATCH v3 06/10] perf: Add perf_event_attr::config4 James Clark
2025-06-30 15:35   ` Ian Rogers
2025-07-14 13:56   ` Will Deacon
2025-06-05 10:49 ` [PATCH v3 07/10] perf: arm_spe: Add support for filtering on data source James Clark
2025-07-14 14:04   ` Will Deacon
2025-07-15 13:04     ` James Clark
2025-07-17 14:29       ` Will Deacon
2025-07-17 15:16         ` James Clark
2025-07-17 15:27           ` Will Deacon
2025-07-17 16:42             ` James Clark
2025-06-05 10:49 ` [PATCH v3 08/10] tools headers UAPI: Sync linux/perf_event.h with the kernel sources James Clark
2025-06-30 15:36   ` Ian Rogers
2025-06-05 10:49 ` [PATCH v3 09/10] perf tools: Add support for perf_event_attr::config4 James Clark
2025-06-05 10:49 ` [PATCH v3 10/10] perf docs: arm-spe: Document new SPE filtering features James Clark
2025-06-30 15:38   ` Ian Rogers
2025-06-30 13:26 ` [PATCH v3 00/10] perf: arm_spe: Armv8.8 SPE features James Clark

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).