linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/15] perf arm-spe: Support new events in FEAT_SPEv1p4
@ 2025-07-31 13:25 Leo Yan
  2025-07-31 13:25 ` [PATCH v4 01/15] perf: arm_spe: Support FEAT_SPEv1p4 filters Leo Yan
                   ` (15 more replies)
  0 siblings, 16 replies; 17+ messages in thread
From: Leo Yan @ 2025-07-31 13:25 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, German Gomez, Ali Saidi
  Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
	Leo Yan

This series adds support for new event types introduced in Arm SPE v1.4.

The first two patches modify the SPE driver to expose 'event_filter'
entry in SysFS caps folder. These patches are part of James' series
"[PATCH v5 00/12] perf: arm_spe: Armv8.8 SPE features" [1]. In case this
series will be merged independently of James' series, these patches have
have been included here.

Patches 03 ~ 15 are support new events in Perf tool:

  * Patches 03 ~ 04: Fix for remote access.
  * Patches 05 ~ 06: Refactor for data source and event bits.
  * Patch 07       : Dump new event bits.
  * Patches 08 ~ 14: Enhance memory-level info for the new events.
  * Patch 15       : Combine analysis of data source and events.
		     As a result, Arm SPE can support both HITM and peer
		     modes (See the "--display" options in perf c2c).

This series has been tested on FVP RevC platform.

Note: for a local HITM event, the emulation does not provide any info
for LLC. However, the perf c2c tool relies on LLC + HITM for accounting
local HITM. I manually set the LLC HIT flag to verify the
"perf c2c -d tot" command.

[1] https://lore.kernel.org/linux-arm-kernel/20250721-james-perf-feat_spe_eft-v5-0-a7bc533485a1@linaro.org/

---
Changes in v4:
- Refactor exposing event filter in SPE driver (Will).
- Rebased on latest perf-tools-next branch.
- Link to v3: https://lore.kernel.org/r/20250707-arm_spe_support_hitm_overhead_v1_public-v3-0-33ea82da3280@arm.com

Changes in v3:
- Retrieve CPU number from PMU type (Ian).
- Link to v2: https://lore.kernel.org/r/20250630-arm_spe_support_hitm_overhead_v1_public-v2-0-2e1afab313b9@arm.com

Changes in v2:
- Dropped the kernel change for caching "pmsevfr_res0" (James)
- Renamed the "events" entry to "event_filter" (James)
- Added a new refactoring patch 04 (James)
- Updated memory level info for remote access (James)
- Link to v1: https://lore.kernel.org/r/20250613-arm_spe_support_hitm_overhead_v1_public-v1-0-6faecf0a8775@arm.com

---
James Clark (2):
      perf: arm_spe: Support FEAT_SPEv1p4 filters
      perf arm_spe: Use full type for data_src

Leo Yan (13):
      perf: arm_spe: Expose event filter
      perf arm_spe: Correct setting remote access
      perf arm_spe: Correct memory level for remote access
      perf arm_spe: Directly propagate raw event
      perf arm_spe: Decode event types for new features
      perf arm_spe: Add "event_filter" entry in meta data
      perf arm_spe: Refine memory level filling
      perf arm_spe: Separate setting of memory levels for loads and stores
      perf arm_spe: Fill memory levels for FEAT_SPEv1p4
      perf arm_spe: Improve CPU number retrieving in per-thread mode
      perf arm_spe: Refactor arm_spe__get_metadata_by_cpu()
      perf arm_spe: Set HITM flag
      perf arm_spe: Allow parsing both data source and events

 arch/arm64/include/asm/sysreg.h                    |   9 -
 drivers/perf/arm_spe_pmu.c                         |  45 +++--
 tools/perf/arch/arm64/util/arm-spe.c               |   5 +
 tools/perf/util/arm-spe-decoder/arm-spe-decoder.c  |  37 +---
 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h  |  33 ++--
 .../util/arm-spe-decoder/arm-spe-pkt-decoder.c     |  14 ++
 .../util/arm-spe-decoder/arm-spe-pkt-decoder.h     |   7 +
 tools/perf/util/arm-spe.c                          | 220 ++++++++++++++++-----
 tools/perf/util/arm-spe.h                          |   2 +
 9 files changed, 241 insertions(+), 131 deletions(-)
---
base-commit: e0c9c1e67156b257e9f1e97e4e372961e4e03fcf
change-id: 20250610-arm_spe_support_hitm_overhead_v1_public-c4a263385434

Best regards,
-- 
Leo Yan <leo.yan@arm.com>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v4 01/15] perf: arm_spe: Support FEAT_SPEv1p4 filters
  2025-07-31 13:25 [PATCH v4 00/15] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
@ 2025-07-31 13:25 ` Leo Yan
  2025-07-31 13:25 ` [PATCH v4 02/15] perf: arm_spe: Expose event filter Leo Yan
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Leo Yan @ 2025-07-31 13:25 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, German Gomez, Ali Saidi
  Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
	Leo Yan

From: James Clark <james.clark@linaro.org>

FEAT_SPEv1p4 (optional from Armv8.8) adds some new filter bits and also
makes some previously available bits unavailable again e.g:

  E[30], bit [30]
  When FEAT_SPEv1p4 is _not_ implemented ...

Continuing to hard code the valid filter bits for each version isn't
scalable, and it also doesn't work for filter bits that aren't related
to SPE version. For example most bits have a further condition:

  E[15], bit [15]
  When ... and filtering on event 15 is supported:

Whether "filtering on event 15" is implemented or not is only
discoverable from the TRM of that specific CPU or by probing
PMSEVFR_EL1.

Instead of hard coding them, write all 1s to the PMSEVFR_EL1 register
and read it back to discover the RES0 bits. Unsupported bits are RAZ/WI
so should read as 0s.

For any hardware that doesn't strictly follow RAZ/WI for unsupported
filters: Any bits that should have been supported in a specific SPE
version but now incorrectly appear to be RES0 wouldn't have worked
anyway, so it's better to fail to open events that request them rather
than behaving unexpectedly. Bits that aren't implemented but also aren't
RAZ/WI will be incorrectly reported as supported, but allowing them to
be used is harmless.

Testing on N1SDP shows the probed RES0 bits to be the same as the hard
coded ones. The FVP with SPEv1p4 shows only additional new RES0 bits,
i.e. no previously hard coded RES0 bits are missing.

Signed-off-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
 arch/arm64/include/asm/sysreg.h |  9 ---------
 drivers/perf/arm_spe_pmu.c      | 23 +++++++----------------
 2 files changed, 7 insertions(+), 25 deletions(-)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index f1bb0d10c39a3f52d2c2bb6a8b9f4adb8cb371dc..e80207572786bcb1f1e17903da67ee8a37d11795 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -350,15 +350,6 @@
 #define SYS_PAR_EL1_ATTR		GENMASK_ULL(63, 56)
 #define SYS_PAR_EL1_F0_RES0		(GENMASK_ULL(6, 1) | GENMASK_ULL(55, 52))
 
-/*** Statistical Profiling Extension ***/
-#define PMSEVFR_EL1_RES0_IMP	\
-	(GENMASK_ULL(47, 32) | GENMASK_ULL(23, 16) | GENMASK_ULL(11, 8) |\
-	 BIT_ULL(6) | BIT_ULL(4) | BIT_ULL(2) | BIT_ULL(0))
-#define PMSEVFR_EL1_RES0_V1P1	\
-	(PMSEVFR_EL1_RES0_IMP & ~(BIT_ULL(18) | BIT_ULL(17) | BIT_ULL(11)))
-#define PMSEVFR_EL1_RES0_V1P2	\
-	(PMSEVFR_EL1_RES0_V1P1 & ~BIT_ULL(6))
-
 /* Buffer error reporting */
 #define PMBSR_EL1_FAULT_FSC_SHIFT	PMBSR_EL1_MSS_SHIFT
 #define PMBSR_EL1_FAULT_FSC_MASK	PMBSR_EL1_MSS_MASK
diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index 3efed8839a4ec5604eba242cb620327cd2a6a87d..051ec885318dd40776ffdedd7d387cdbbd35c52f 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -89,6 +89,7 @@ struct arm_spe_pmu {
 #define SPE_PMU_FEAT_DEV_PROBED			(1UL << 63)
 	u64					features;
 
+	u64					pmsevfr_res0;
 	u16					max_record_sz;
 	u16					align;
 	struct perf_output_handle __percpu	*handle;
@@ -693,20 +694,6 @@ static irqreturn_t arm_spe_pmu_irq_handler(int irq, void *dev)
 	return IRQ_HANDLED;
 }
 
-static u64 arm_spe_pmsevfr_res0(u16 pmsver)
-{
-	switch (pmsver) {
-	case ID_AA64DFR0_EL1_PMSVer_IMP:
-		return PMSEVFR_EL1_RES0_IMP;
-	case ID_AA64DFR0_EL1_PMSVer_V1P1:
-		return PMSEVFR_EL1_RES0_V1P1;
-	case ID_AA64DFR0_EL1_PMSVer_V1P2:
-	/* Return the highest version we support in default */
-	default:
-		return PMSEVFR_EL1_RES0_V1P2;
-	}
-}
-
 /* Perf callbacks */
 static int arm_spe_pmu_event_init(struct perf_event *event)
 {
@@ -722,10 +709,10 @@ static int arm_spe_pmu_event_init(struct perf_event *event)
 	    !cpumask_test_cpu(event->cpu, &spe_pmu->supported_cpus))
 		return -ENOENT;
 
-	if (arm_spe_event_to_pmsevfr(event) & arm_spe_pmsevfr_res0(spe_pmu->pmsver))
+	if (arm_spe_event_to_pmsevfr(event) & spe_pmu->pmsevfr_res0)
 		return -EOPNOTSUPP;
 
-	if (arm_spe_event_to_pmsnevfr(event) & arm_spe_pmsevfr_res0(spe_pmu->pmsver))
+	if (arm_spe_event_to_pmsnevfr(event) & spe_pmu->pmsevfr_res0)
 		return -EOPNOTSUPP;
 
 	if (attr->exclude_idle)
@@ -1103,6 +1090,10 @@ static void __arm_spe_pmu_dev_probe(void *info)
 		spe_pmu->counter_sz = 16;
 	}
 
+	/* Write all 1s and then read back. Unsupported filter bits are RAZ/WI. */
+	write_sysreg_s(U64_MAX, SYS_PMSEVFR_EL1);
+	spe_pmu->pmsevfr_res0 = ~read_sysreg_s(SYS_PMSEVFR_EL1);
+
 	dev_info(dev,
 		 "probed SPEv1.%d for CPUs %*pbl [max_record_sz %u, align %u, features 0x%llx]\n",
 		 spe_pmu->pmsver - 1, cpumask_pr_args(&spe_pmu->supported_cpus),

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 02/15] perf: arm_spe: Expose event filter
  2025-07-31 13:25 [PATCH v4 00/15] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
  2025-07-31 13:25 ` [PATCH v4 01/15] perf: arm_spe: Support FEAT_SPEv1p4 filters Leo Yan
@ 2025-07-31 13:25 ` Leo Yan
  2025-07-31 13:25 ` [PATCH v4 03/15] perf arm_spe: Correct setting remote access Leo Yan
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Leo Yan @ 2025-07-31 13:25 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, German Gomez, Ali Saidi
  Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
	Leo Yan

Expose an "event_filter" entry in the caps folder to inform user space
about which events can be filtered.

Change the return type of arm_spe_pmu_cap_get() from u32 to u64 to
accommodate the added event filter entry.

Co-developed-by: James Clark <james.clark@linaro.org>
Signed-off-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
 drivers/perf/arm_spe_pmu.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index 051ec885318dd40776ffdedd7d387cdbbd35c52f..3e9221a22a61294b1c7b2c5e3550c3eac51dd399 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -116,6 +116,7 @@ enum arm_spe_pmu_capabilities {
 	SPE_PMU_CAP_FEAT_MAX,
 	SPE_PMU_CAP_CNT_SZ = SPE_PMU_CAP_FEAT_MAX,
 	SPE_PMU_CAP_MIN_IVAL,
+	SPE_PMU_CAP_EVENT_FILTER,
 };
 
 static int arm_spe_pmu_feat_caps[SPE_PMU_CAP_FEAT_MAX] = {
@@ -123,7 +124,7 @@ static int arm_spe_pmu_feat_caps[SPE_PMU_CAP_FEAT_MAX] = {
 	[SPE_PMU_CAP_ERND]	= SPE_PMU_FEAT_ERND,
 };
 
-static u32 arm_spe_pmu_cap_get(struct arm_spe_pmu *spe_pmu, int cap)
+static u64 arm_spe_pmu_cap_get(struct arm_spe_pmu *spe_pmu, int cap)
 {
 	if (cap < SPE_PMU_CAP_FEAT_MAX)
 		return !!(spe_pmu->features & arm_spe_pmu_feat_caps[cap]);
@@ -133,6 +134,8 @@ static u32 arm_spe_pmu_cap_get(struct arm_spe_pmu *spe_pmu, int cap)
 		return spe_pmu->counter_sz;
 	case SPE_PMU_CAP_MIN_IVAL:
 		return spe_pmu->min_period;
+	case SPE_PMU_CAP_EVENT_FILTER:
+		return ~spe_pmu->pmsevfr_res0;
 	default:
 		WARN(1, "unknown cap %d\n", cap);
 	}
@@ -149,7 +152,19 @@ static ssize_t arm_spe_pmu_cap_show(struct device *dev,
 		container_of(attr, struct dev_ext_attribute, attr);
 	int cap = (long)ea->var;
 
-	return sysfs_emit(buf, "%u\n", arm_spe_pmu_cap_get(spe_pmu, cap));
+	return sysfs_emit(buf, "%llu\n", arm_spe_pmu_cap_get(spe_pmu, cap));
+}
+
+static ssize_t arm_spe_pmu_cap_show_hex(struct device *dev,
+					struct device_attribute *attr,
+					char *buf)
+{
+	struct arm_spe_pmu *spe_pmu = dev_get_drvdata(dev);
+	struct dev_ext_attribute *ea =
+		container_of(attr, struct dev_ext_attribute, attr);
+	int cap = (long)ea->var;
+
+	return sysfs_emit(buf, "0x%llx\n", arm_spe_pmu_cap_get(spe_pmu, cap));
 }
 
 #define SPE_EXT_ATTR_ENTRY(_name, _func, _var)				\
@@ -159,12 +174,15 @@ static ssize_t arm_spe_pmu_cap_show(struct device *dev,
 
 #define SPE_CAP_EXT_ATTR_ENTRY(_name, _var)				\
 	SPE_EXT_ATTR_ENTRY(_name, arm_spe_pmu_cap_show, _var)
+#define SPE_CAP_EXT_ATTR_ENTRY_HEX(_name, _var)				\
+	SPE_EXT_ATTR_ENTRY(_name, arm_spe_pmu_cap_show_hex, _var)
 
 static struct attribute *arm_spe_pmu_cap_attr[] = {
 	SPE_CAP_EXT_ATTR_ENTRY(arch_inst, SPE_PMU_CAP_ARCH_INST),
 	SPE_CAP_EXT_ATTR_ENTRY(ernd, SPE_PMU_CAP_ERND),
 	SPE_CAP_EXT_ATTR_ENTRY(count_size, SPE_PMU_CAP_CNT_SZ),
 	SPE_CAP_EXT_ATTR_ENTRY(min_interval, SPE_PMU_CAP_MIN_IVAL),
+	SPE_CAP_EXT_ATTR_ENTRY_HEX(event_filter, SPE_PMU_CAP_EVENT_FILTER),
 	NULL,
 };
 

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 03/15] perf arm_spe: Correct setting remote access
  2025-07-31 13:25 [PATCH v4 00/15] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
  2025-07-31 13:25 ` [PATCH v4 01/15] perf: arm_spe: Support FEAT_SPEv1p4 filters Leo Yan
  2025-07-31 13:25 ` [PATCH v4 02/15] perf: arm_spe: Expose event filter Leo Yan
@ 2025-07-31 13:25 ` Leo Yan
  2025-07-31 13:25 ` [PATCH v4 04/15] perf arm_spe: Correct memory level for " Leo Yan
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Leo Yan @ 2025-07-31 13:25 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, German Gomez, Ali Saidi
  Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
	Leo Yan

Set the mem_remote field for a remote access to appropriately represent
the event.

Fixes: a89dbc9b988f ("perf arm-spe: Set sample's data source field")
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
 tools/perf/util/arm-spe.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 8942fa598a84fb70f8c66649395bfbedcd42d380..8ecf7142dcd8705c13c27e0cbacd69647dc4fd22 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -839,7 +839,7 @@ static void arm_spe__synth_memory_level(const struct arm_spe_record *record,
 	}
 
 	if (record->type & ARM_SPE_REMOTE_ACCESS)
-		data_src->mem_lvl |= PERF_MEM_LVL_REM_CCE1;
+		data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
 }
 
 static bool arm_spe__synth_ds(struct arm_spe_queue *speq,

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 04/15] perf arm_spe: Correct memory level for remote access
  2025-07-31 13:25 [PATCH v4 00/15] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
                   ` (2 preceding siblings ...)
  2025-07-31 13:25 ` [PATCH v4 03/15] perf arm_spe: Correct setting remote access Leo Yan
@ 2025-07-31 13:25 ` Leo Yan
  2025-07-31 13:25 ` [PATCH v4 05/15] perf arm_spe: Use full type for data_src Leo Yan
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Leo Yan @ 2025-07-31 13:25 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, German Gomez, Ali Saidi
  Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
	Leo Yan

For remote accesses, the data source packet does not contain information
about the memory level. To avoid misinformation, set the memory level to
NA (Not Available).

Fixes: 4e6430cbb1a9 ("perf arm-spe: Use SPE data source for neoverse cores")
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
 tools/perf/util/arm-spe.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 8ecf7142dcd8705c13c27e0cbacd69647dc4fd22..3086dad92965af3981b1065f2892a3e67df7b616 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -670,8 +670,8 @@ static void arm_spe__synth_data_source_common(const struct arm_spe_record *recor
 	 * socket
 	 */
 	case ARM_SPE_COMMON_DS_REMOTE:
-		data_src->mem_lvl = PERF_MEM_LVL_REM_CCE1;
-		data_src->mem_lvl_num = PERF_MEM_LVLNUM_ANY_CACHE;
+		data_src->mem_lvl = PERF_MEM_LVL_NA;
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_NA;
 		data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
 		data_src->mem_snoopx = PERF_MEM_SNOOPX_PEER;
 		break;

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 05/15] perf arm_spe: Use full type for data_src
  2025-07-31 13:25 [PATCH v4 00/15] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
                   ` (3 preceding siblings ...)
  2025-07-31 13:25 ` [PATCH v4 04/15] perf arm_spe: Correct memory level for " Leo Yan
@ 2025-07-31 13:25 ` Leo Yan
  2025-07-31 13:25 ` [PATCH v4 06/15] perf arm_spe: Directly propagate raw event Leo Yan
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Leo Yan @ 2025-07-31 13:25 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, German Gomez, Ali Saidi
  Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
	Leo Yan

From: James Clark <james.clark@linaro.org>

data_src has an actual type rather than just being a u64. To help
readers, delay decomposing it to a u64 until it's finally assigned to
the sample.

Signed-off-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
 tools/perf/util/arm-spe.c | 25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 3086dad92965af3981b1065f2892a3e67df7b616..8aed942a3da22f71f2c302586a4e84d68600e597 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -471,7 +471,8 @@ arm_spe_deliver_synth_event(struct arm_spe *spe,
 }
 
 static int arm_spe__synth_mem_sample(struct arm_spe_queue *speq,
-				     u64 spe_events_id, u64 data_src)
+				     u64 spe_events_id,
+				     union perf_mem_data_src data_src)
 {
 	struct arm_spe *spe = speq->spe;
 	struct arm_spe_record *record = &speq->decoder->record;
@@ -486,7 +487,7 @@ static int arm_spe__synth_mem_sample(struct arm_spe_queue *speq,
 	sample.stream_id = spe_events_id;
 	sample.addr = record->virt_addr;
 	sample.phys_addr = record->phys_addr;
-	sample.data_src = data_src;
+	sample.data_src = data_src.val;
 	sample.weight = record->latency;
 
 	ret = arm_spe_deliver_synth_event(spe, speq, event, &sample);
@@ -519,7 +520,8 @@ static int arm_spe__synth_branch_sample(struct arm_spe_queue *speq,
 }
 
 static int arm_spe__synth_instruction_sample(struct arm_spe_queue *speq,
-					     u64 spe_events_id, u64 data_src)
+					     u64 spe_events_id,
+					     union perf_mem_data_src data_src)
 {
 	struct arm_spe *spe = speq->spe;
 	struct arm_spe_record *record = &speq->decoder->record;
@@ -542,7 +544,7 @@ static int arm_spe__synth_instruction_sample(struct arm_spe_queue *speq,
 	sample.stream_id = spe_events_id;
 	sample.addr = record->to_ip;
 	sample.phys_addr = record->phys_addr;
-	sample.data_src = data_src;
+	sample.data_src = data_src.val;
 	sample.period = spe->instructions_sample_period;
 	sample.weight = record->latency;
 	sample.flags = speq->flags;
@@ -891,21 +893,22 @@ static bool arm_spe__synth_ds(struct arm_spe_queue *speq,
 	return false;
 }
 
-static u64 arm_spe__synth_data_source(struct arm_spe_queue *speq,
-				      const struct arm_spe_record *record)
+static union perf_mem_data_src
+arm_spe__synth_data_source(struct arm_spe_queue *speq,
+			   const struct arm_spe_record *record)
 {
-	union perf_mem_data_src	data_src = { .mem_op = PERF_MEM_OP_NA };
+	union perf_mem_data_src	data_src = {};
 
 	/* Only synthesize data source for LDST operations */
 	if (!is_ldst_op(record->op))
-		return 0;
+		return data_src;
 
 	if (record->op & ARM_SPE_OP_LD)
 		data_src.mem_op = PERF_MEM_OP_LOAD;
 	else if (record->op & ARM_SPE_OP_ST)
 		data_src.mem_op = PERF_MEM_OP_STORE;
 	else
-		return 0;
+		return data_src;
 
 	if (!arm_spe__synth_ds(speq, record, &data_src))
 		arm_spe__synth_memory_level(record, &data_src);
@@ -919,14 +922,14 @@ static u64 arm_spe__synth_data_source(struct arm_spe_queue *speq,
 			data_src.mem_dtlb |= PERF_MEM_TLB_HIT;
 	}
 
-	return data_src.val;
+	return data_src;
 }
 
 static int arm_spe_sample(struct arm_spe_queue *speq)
 {
 	const struct arm_spe_record *record = &speq->decoder->record;
 	struct arm_spe *spe = speq->spe;
-	u64 data_src;
+	union perf_mem_data_src data_src;
 	int err;
 
 	arm_spe__sample_flags(speq);

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 06/15] perf arm_spe: Directly propagate raw event
  2025-07-31 13:25 [PATCH v4 00/15] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
                   ` (4 preceding siblings ...)
  2025-07-31 13:25 ` [PATCH v4 05/15] perf arm_spe: Use full type for data_src Leo Yan
@ 2025-07-31 13:25 ` Leo Yan
  2025-07-31 13:25 ` [PATCH v4 07/15] perf arm_spe: Decode event types for new features Leo Yan
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Leo Yan @ 2025-07-31 13:25 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, German Gomez, Ali Saidi
  Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
	Leo Yan

Two sets of event bits are defined: one for generating samples and
another are raw event bits used in the backend decoder. Reduce the
redundancy by using the raw event bits directly in the frontend code.

To avoid overflow issues, change the type of the event variable from
enum to u64.

Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
 tools/perf/util/arm-spe-decoder/arm-spe-decoder.c | 37 +----------------------
 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h | 28 ++++++++---------
 2 files changed, 14 insertions(+), 51 deletions(-)

diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
index 688fe6d7524416420a7c18d5f8a268492ce7c3b8..96eb7cced6fd1574f5d823e4c67b9051dcf183ed 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
@@ -229,42 +229,7 @@ static int arm_spe_read_record(struct arm_spe_decoder *decoder)
 			}
 			break;
 		case ARM_SPE_EVENTS:
-			if (payload & BIT(EV_L1D_REFILL))
-				decoder->record.type |= ARM_SPE_L1D_MISS;
-
-			if (payload & BIT(EV_L1D_ACCESS))
-				decoder->record.type |= ARM_SPE_L1D_ACCESS;
-
-			if (payload & BIT(EV_TLB_WALK))
-				decoder->record.type |= ARM_SPE_TLB_MISS;
-
-			if (payload & BIT(EV_TLB_ACCESS))
-				decoder->record.type |= ARM_SPE_TLB_ACCESS;
-
-			if (payload & BIT(EV_LLC_MISS))
-				decoder->record.type |= ARM_SPE_LLC_MISS;
-
-			if (payload & BIT(EV_LLC_ACCESS))
-				decoder->record.type |= ARM_SPE_LLC_ACCESS;
-
-			if (payload & BIT(EV_REMOTE_ACCESS))
-				decoder->record.type |= ARM_SPE_REMOTE_ACCESS;
-
-			if (payload & BIT(EV_MISPRED))
-				decoder->record.type |= ARM_SPE_BRANCH_MISS;
-
-			if (payload & BIT(EV_NOT_TAKEN))
-				decoder->record.type |= ARM_SPE_BRANCH_NOT_TAKEN;
-
-			if (payload & BIT(EV_TRANSACTIONAL))
-				decoder->record.type |= ARM_SPE_IN_TXN;
-
-			if (payload & BIT(EV_PARTIAL_PREDICATE))
-				decoder->record.type |= ARM_SPE_SVE_PARTIAL_PRED;
-
-			if (payload & BIT(EV_EMPTY_PREDICATE))
-				decoder->record.type |= ARM_SPE_SVE_EMPTY_PRED;
-
+			decoder->record.type = payload;
 			break;
 		case ARM_SPE_DATA_SOURCE:
 			decoder->record.source = payload;
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
index 881d9f29c1380b62486f0cd81498750ba06c4b50..03da55453da8fd2e7b9e2dcba3ddcf5243599e1c 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
@@ -13,20 +13,18 @@
 
 #include "arm-spe-pkt-decoder.h"
 
-enum arm_spe_sample_type {
-	ARM_SPE_L1D_ACCESS		= 1 << 0,
-	ARM_SPE_L1D_MISS		= 1 << 1,
-	ARM_SPE_LLC_ACCESS		= 1 << 2,
-	ARM_SPE_LLC_MISS		= 1 << 3,
-	ARM_SPE_TLB_ACCESS		= 1 << 4,
-	ARM_SPE_TLB_MISS		= 1 << 5,
-	ARM_SPE_BRANCH_MISS		= 1 << 6,
-	ARM_SPE_REMOTE_ACCESS		= 1 << 7,
-	ARM_SPE_SVE_PARTIAL_PRED	= 1 << 8,
-	ARM_SPE_SVE_EMPTY_PRED		= 1 << 9,
-	ARM_SPE_BRANCH_NOT_TAKEN	= 1 << 10,
-	ARM_SPE_IN_TXN			= 1 << 11,
-};
+#define ARM_SPE_L1D_ACCESS		BIT(EV_L1D_ACCESS)
+#define ARM_SPE_L1D_MISS		BIT(EV_L1D_REFILL)
+#define ARM_SPE_LLC_ACCESS		BIT(EV_LLC_ACCESS)
+#define ARM_SPE_LLC_MISS		BIT(EV_LLC_MISS)
+#define ARM_SPE_TLB_ACCESS		BIT(EV_TLB_ACCESS)
+#define ARM_SPE_TLB_MISS		BIT(EV_TLB_WALK)
+#define ARM_SPE_BRANCH_MISS		BIT(EV_MISPRED)
+#define ARM_SPE_BRANCH_NOT_TAKEN	BIT(EV_NOT_TAKEN)
+#define ARM_SPE_REMOTE_ACCESS		BIT(EV_REMOTE_ACCESS)
+#define ARM_SPE_SVE_PARTIAL_PRED	BIT(EV_PARTIAL_PREDICATE)
+#define ARM_SPE_SVE_EMPTY_PRED		BIT(EV_EMPTY_PREDICATE)
+#define ARM_SPE_IN_TXN			BIT(EV_TRANSACTIONAL)
 
 enum arm_spe_op_type {
 	/* First level operation type */
@@ -100,7 +98,7 @@ enum arm_spe_hisi_hip_data_source {
 };
 
 struct arm_spe_record {
-	enum arm_spe_sample_type type;
+	u64 type;
 	int err;
 	u32 op;
 	u32 latency;

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 07/15] perf arm_spe: Decode event types for new features
  2025-07-31 13:25 [PATCH v4 00/15] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
                   ` (5 preceding siblings ...)
  2025-07-31 13:25 ` [PATCH v4 06/15] perf arm_spe: Directly propagate raw event Leo Yan
@ 2025-07-31 13:25 ` Leo Yan
  2025-07-31 13:25 ` [PATCH v4 08/15] perf arm_spe: Add "event_filter" entry in meta data Leo Yan
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Leo Yan @ 2025-07-31 13:25 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, German Gomez, Ali Saidi
  Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
	Leo Yan

Decode new event types introduced by FEAT_SPEv1p4, FEAT_SPE_SME and
FEAT_SPE_SME.

The printed event names don't strictly follow the naming in the Arm ARM.
For example, the "Cache data modified" event is shown as "HITM", and the
"Data snooped" event is printed as "SNOOPED". Shorter names are easier
to read while preserving core meanings.

Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
 tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c | 14 ++++++++++++++
 tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h |  7 +++++++
 2 files changed, 21 insertions(+)

diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
index 13cadb2f1ceac7a90e359c4d6aa1d5fc5169e142..80561630253dd5c46f7e99b24fc13b99f346459f 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
@@ -314,6 +314,20 @@ static int arm_spe_pkt_desc_event(const struct arm_spe_pkt *packet,
 		arm_spe_pkt_out_string(&err, &buf, &buf_len, " SVE-PARTIAL-PRED");
 	if (payload & BIT(EV_EMPTY_PREDICATE))
 		arm_spe_pkt_out_string(&err, &buf, &buf_len, " SVE-EMPTY-PRED");
+	if (payload & BIT(EV_L2D_ACCESS))
+		arm_spe_pkt_out_string(&err, &buf, &buf_len, " L2D-ACCESS");
+	if (payload & BIT(EV_L2D_MISS))
+		arm_spe_pkt_out_string(&err, &buf, &buf_len, " L2D-MISS");
+	if (payload & BIT(EV_CACHE_DATA_MODIFIED))
+		arm_spe_pkt_out_string(&err, &buf, &buf_len, " HITM");
+	if (payload & BIT(EV_RECENTLY_FETCHED))
+		arm_spe_pkt_out_string(&err, &buf, &buf_len, " LFB");
+	if (payload & BIT(EV_DATA_SNOOPED))
+		arm_spe_pkt_out_string(&err, &buf, &buf_len, " SNOOPED");
+	if (payload & BIT(EV_STREAMING_SVE_MODE))
+		arm_spe_pkt_out_string(&err, &buf, &buf_len, " STREAMING-SVE");
+	if (payload & BIT(EV_SMCU))
+		arm_spe_pkt_out_string(&err, &buf, &buf_len, " SMCU");
 
 	return err;
 }
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
index 2cdf9f6da2681244291445d54c5b13fe8a2e9d9a..d00c2481712dcc457eab2f5e9848ffc3150e6236 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
@@ -108,6 +108,13 @@ enum arm_spe_events {
 	EV_TRANSACTIONAL	= 16,
 	EV_PARTIAL_PREDICATE	= 17,
 	EV_EMPTY_PREDICATE	= 18,
+	EV_L2D_ACCESS		= 19,
+	EV_L2D_MISS		= 20,
+	EV_CACHE_DATA_MODIFIED	= 21,
+	EV_RECENTLY_FETCHED	= 22,
+	EV_DATA_SNOOPED		= 23,
+	EV_STREAMING_SVE_MODE	= 24,
+	EV_SMCU			= 25,
 };
 
 /* Operation packet header */

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 08/15] perf arm_spe: Add "event_filter" entry in meta data
  2025-07-31 13:25 [PATCH v4 00/15] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
                   ` (6 preceding siblings ...)
  2025-07-31 13:25 ` [PATCH v4 07/15] perf arm_spe: Decode event types for new features Leo Yan
@ 2025-07-31 13:25 ` Leo Yan
  2025-07-31 13:25 ` [PATCH v4 09/15] perf arm_spe: Refine memory level filling Leo Yan
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Leo Yan @ 2025-07-31 13:25 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, German Gomez, Ali Saidi
  Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
	Leo Yan

Add a new "event_filter" entry in the meta data and dump it in raw data
mode.

After:

  # perf script -D
  ...

  0 0 0x470 [0x1f0]: PERF_RECORD_AUXTRACE_INFO type: 4
    Header version     :2
    Header size        :4
    PMU type v2        :11
    CPU number         :8
      Magic            :0x1010101010101010
      CPU #            :0
      Num of params    :4
      MIDR             :0x410fd0f0
      PMU Type         :11
      Min Interval     :256
      Event Filter     :0x3fe08fe

  ...

Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
 tools/perf/arch/arm64/util/arm-spe.c | 5 +++++
 tools/perf/util/arm-spe.c            | 1 +
 tools/perf/util/arm-spe.h            | 2 ++
 3 files changed, 8 insertions(+)

diff --git a/tools/perf/arch/arm64/util/arm-spe.c b/tools/perf/arch/arm64/util/arm-spe.c
index 4f2833b62ff55f3fd1dff3f032d6e06528460939..cac43cde7dbee94884938482d03989a2c69cb0b1 100644
--- a/tools/perf/arch/arm64/util/arm-spe.c
+++ b/tools/perf/arch/arm64/util/arm-spe.c
@@ -121,12 +121,17 @@ static int arm_spe_save_cpu_header(struct auxtrace_record *itr,
 		/* No Arm SPE PMU is found */
 		data[ARM_SPE_CPU_PMU_TYPE] = ULLONG_MAX;
 		data[ARM_SPE_CAP_MIN_IVAL] = 0;
+		data[ARM_SPE_CAP_EVENT_FILTER] = 0;
 	} else {
 		data[ARM_SPE_CPU_PMU_TYPE] = pmu->type;
 
 		if (perf_pmu__scan_file(pmu, "caps/min_interval", "%lu", &val) != 1)
 			val = 0;
 		data[ARM_SPE_CAP_MIN_IVAL] = val;
+
+		if (perf_pmu__scan_file(pmu, "caps/event_filter", "%lx", &val) != 1)
+			val = 0;
+		data[ARM_SPE_CAP_EVENT_FILTER] = val;
 	}
 
 	free(cpuid);
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 8aed942a3da22f71f2c302586a4e84d68600e597..727e43f538190cf5de8ef8d5ad3546a871c858eb 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -1535,6 +1535,7 @@ static const char * const metadata_per_cpu_fmts[] = {
 	[ARM_SPE_CPU_MIDR]		= "    MIDR             :0x%"PRIx64"\n",
 	[ARM_SPE_CPU_PMU_TYPE]		= "    PMU Type         :%"PRId64"\n",
 	[ARM_SPE_CAP_MIN_IVAL]		= "    Min Interval     :%"PRId64"\n",
+	[ARM_SPE_CAP_EVENT_FILTER]	= "    Event Filter     :0x%"PRIx64"\n",
 };
 
 static void arm_spe_print_info(struct arm_spe *spe, __u64 *arr)
diff --git a/tools/perf/util/arm-spe.h b/tools/perf/util/arm-spe.h
index 390679a4af2fb61419bc881b5dc43c01f1dd77d7..3966df1856d8234bb5fe580c3f128c4d238c6221 100644
--- a/tools/perf/util/arm-spe.h
+++ b/tools/perf/util/arm-spe.h
@@ -47,6 +47,8 @@ enum {
 	ARM_SPE_CPU_PMU_TYPE,
 	/* Minimal interval */
 	ARM_SPE_CAP_MIN_IVAL,
+	/* Event filter */
+	ARM_SPE_CAP_EVENT_FILTER,
 	ARM_SPE_CPU_PRIV_MAX,
 };
 

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 09/15] perf arm_spe: Refine memory level filling
  2025-07-31 13:25 [PATCH v4 00/15] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
                   ` (7 preceding siblings ...)
  2025-07-31 13:25 ` [PATCH v4 08/15] perf arm_spe: Add "event_filter" entry in meta data Leo Yan
@ 2025-07-31 13:25 ` Leo Yan
  2025-07-31 13:25 ` [PATCH v4 10/15] perf arm_spe: Separate setting of memory levels for loads and stores Leo Yan
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Leo Yan @ 2025-07-31 13:25 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, German Gomez, Ali Saidi
  Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
	Leo Yan

This commit introduces macros for detecting cache level and cache miss.

Populates the 'mem_lvl_num' field which is a later added attribute for
representing memory level. Set NA ("not available") to memory levels if
memory hierarchy info is absent.

Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
 tools/perf/util/arm-spe.c | 32 +++++++++++++++++++++-----------
 1 file changed, 21 insertions(+), 11 deletions(-)

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 727e43f538190cf5de8ef8d5ad3546a871c858eb..4f46d796ffd2f15068329b00e7f0747c80cd7cf8 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -39,6 +39,15 @@
 
 #define is_ldst_op(op)		(!!((op) & ARM_SPE_OP_LDST))
 
+#define ARM_SPE_CACHE_EVENT(lvl) \
+	(ARM_SPE_##lvl##_ACCESS | ARM_SPE_##lvl##_MISS)
+
+#define arm_spe_is_cache_level(type, lvl) \
+	((type) & ARM_SPE_CACHE_EVENT(lvl))
+
+#define arm_spe_is_cache_miss(type, lvl) \
+	((type) & ARM_SPE_##lvl##_MISS)
+
 struct arm_spe {
 	struct auxtrace			auxtrace;
 	struct auxtrace_queues		queues;
@@ -824,20 +833,21 @@ static const struct data_source_handle data_source_handles[] = {
 static void arm_spe__synth_memory_level(const struct arm_spe_record *record,
 					union perf_mem_data_src *data_src)
 {
-	if (record->type & (ARM_SPE_LLC_ACCESS | ARM_SPE_LLC_MISS)) {
+	if (arm_spe_is_cache_level(record->type, LLC)) {
 		data_src->mem_lvl = PERF_MEM_LVL_L3;
-
-		if (record->type & ARM_SPE_LLC_MISS)
-			data_src->mem_lvl |= PERF_MEM_LVL_MISS;
-		else
-			data_src->mem_lvl |= PERF_MEM_LVL_HIT;
-	} else if (record->type & (ARM_SPE_L1D_ACCESS | ARM_SPE_L1D_MISS)) {
+		data_src->mem_lvl |= arm_spe_is_cache_miss(record->type, LLC) ?
+				     PERF_MEM_LVL_MISS : PERF_MEM_LVL_HIT;
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_L3;
+	} else if (arm_spe_is_cache_level(record->type, L1D)) {
 		data_src->mem_lvl = PERF_MEM_LVL_L1;
+		data_src->mem_lvl |= arm_spe_is_cache_miss(record->type, L1D) ?
+				     PERF_MEM_LVL_MISS : PERF_MEM_LVL_HIT;
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_L1;
+	}
 
-		if (record->type & ARM_SPE_L1D_MISS)
-			data_src->mem_lvl |= PERF_MEM_LVL_MISS;
-		else
-			data_src->mem_lvl |= PERF_MEM_LVL_HIT;
+	if (!data_src->mem_lvl) {
+		data_src->mem_lvl = PERF_MEM_LVL_NA;
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_NA;
 	}
 
 	if (record->type & ARM_SPE_REMOTE_ACCESS)

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 10/15] perf arm_spe: Separate setting of memory levels for loads and stores
  2025-07-31 13:25 [PATCH v4 00/15] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
                   ` (8 preceding siblings ...)
  2025-07-31 13:25 ` [PATCH v4 09/15] perf arm_spe: Refine memory level filling Leo Yan
@ 2025-07-31 13:25 ` Leo Yan
  2025-07-31 13:25 ` [PATCH v4 11/15] perf arm_spe: Fill memory levels for FEAT_SPEv1p4 Leo Yan
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Leo Yan @ 2025-07-31 13:25 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, German Gomez, Ali Saidi
  Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
	Leo Yan

For a load hit, the lowest-level cache reflects the latency of fetching
a data. Otherwise, the highest-level cache involved in refilling
indicates the overhead caused by a load.

Store operations remain unchanged to keep the descending order when
iterating through cache levels.

Split into two functions: one is for setting memory levels for loads and
another for stores.

Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
 tools/perf/util/arm-spe.c | 45 +++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 43 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 4f46d796ffd2f15068329b00e7f0747c80cd7cf8..635e87e10a38c27127091c244d3ec145d43d7aa7 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -45,6 +45,9 @@
 #define arm_spe_is_cache_level(type, lvl) \
 	((type) & ARM_SPE_CACHE_EVENT(lvl))
 
+#define arm_spe_is_cache_hit(type, lvl) \
+	(((type) & ARM_SPE_CACHE_EVENT(lvl)) == ARM_SPE_##lvl##_ACCESS)
+
 #define arm_spe_is_cache_miss(type, lvl) \
 	((type) & ARM_SPE_##lvl##_MISS)
 
@@ -830,9 +833,38 @@ static const struct data_source_handle data_source_handles[] = {
 	DS(hisi_hip_ds_encoding_cpus, data_source_hisi_hip),
 };
 
-static void arm_spe__synth_memory_level(const struct arm_spe_record *record,
-					union perf_mem_data_src *data_src)
+static void arm_spe__synth_ld_memory_level(const struct arm_spe_record *record,
+					   union perf_mem_data_src *data_src)
+{
+	/*
+	 * To find a cache hit, search in ascending order from the lower level
+	 * caches to the higher level caches. This reflects the best scenario
+	 * for a cache hit.
+	 */
+	if (arm_spe_is_cache_hit(record->type, L1D)) {
+		data_src->mem_lvl = PERF_MEM_LVL_L1 | PERF_MEM_LVL_HIT;
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_L1;
+	} else if (arm_spe_is_cache_hit(record->type, LLC)) {
+		data_src->mem_lvl = PERF_MEM_LVL_L3 | PERF_MEM_LVL_HIT;
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_L3;
+	/*
+	 * To find a cache miss, search in descending order from the higher
+	 * level cache to the lower level cache. This represents the worst
+	 * scenario for a cache miss.
+	 */
+	} else if (arm_spe_is_cache_miss(record->type, LLC)) {
+		data_src->mem_lvl = PERF_MEM_LVL_L3 | PERF_MEM_LVL_MISS;
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_L3;
+	} else if (arm_spe_is_cache_miss(record->type, L1D)) {
+		data_src->mem_lvl = PERF_MEM_LVL_L1 | PERF_MEM_LVL_MISS;
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_L1;
+	}
+}
+
+static void arm_spe__synth_st_memory_level(const struct arm_spe_record *record,
+					   union perf_mem_data_src *data_src)
 {
+	/* Record the greatest level info for a store operation. */
 	if (arm_spe_is_cache_level(record->type, LLC)) {
 		data_src->mem_lvl = PERF_MEM_LVL_L3;
 		data_src->mem_lvl |= arm_spe_is_cache_miss(record->type, LLC) ?
@@ -844,6 +876,15 @@ static void arm_spe__synth_memory_level(const struct arm_spe_record *record,
 				     PERF_MEM_LVL_MISS : PERF_MEM_LVL_HIT;
 		data_src->mem_lvl_num = PERF_MEM_LVLNUM_L1;
 	}
+}
+
+static void arm_spe__synth_memory_level(const struct arm_spe_record *record,
+					union perf_mem_data_src *data_src)
+{
+	if (data_src->mem_op == PERF_MEM_OP_LOAD)
+		arm_spe__synth_ld_memory_level(record, data_src);
+	if (data_src->mem_op == PERF_MEM_OP_STORE)
+		arm_spe__synth_st_memory_level(record, data_src);
 
 	if (!data_src->mem_lvl) {
 		data_src->mem_lvl = PERF_MEM_LVL_NA;

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 11/15] perf arm_spe: Fill memory levels for FEAT_SPEv1p4
  2025-07-31 13:25 [PATCH v4 00/15] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
                   ` (9 preceding siblings ...)
  2025-07-31 13:25 ` [PATCH v4 10/15] perf arm_spe: Separate setting of memory levels for loads and stores Leo Yan
@ 2025-07-31 13:25 ` Leo Yan
  2025-07-31 13:25 ` [PATCH v4 12/15] perf arm_spe: Improve CPU number retrieving in per-thread mode Leo Yan
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Leo Yan @ 2025-07-31 13:25 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, German Gomez, Ali Saidi
  Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
	Leo Yan

Starting with FEAT_SPEv1p4, Arm SPE provides information on Level 2 data
cache and recently fetched events. This patch fills in the memory levels
for these new events.

The recently fetched events are matched to line-fill buffer (LFB). In
general, the latency for accessing LFB is higher than accessing L1 cache
but lower than accessing L2 cache. Thus, it locates in the memory
hierarchy information between L1 cache and L2 cache.

Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h |  3 +++
 tools/perf/util/arm-spe.c                         | 14 ++++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
index 03da55453da8fd2e7b9e2dcba3ddcf5243599e1c..3afa8703b21db9d231eef93fe981e0c20d562e83 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
@@ -25,6 +25,9 @@
 #define ARM_SPE_SVE_PARTIAL_PRED	BIT(EV_PARTIAL_PREDICATE)
 #define ARM_SPE_SVE_EMPTY_PRED		BIT(EV_EMPTY_PREDICATE)
 #define ARM_SPE_IN_TXN			BIT(EV_TRANSACTIONAL)
+#define ARM_SPE_L2D_ACCESS		BIT(EV_L2D_ACCESS)
+#define ARM_SPE_L2D_MISS		BIT(EV_L2D_MISS)
+#define ARM_SPE_RECENTLY_FETCHED	BIT(EV_RECENTLY_FETCHED)
 
 enum arm_spe_op_type {
 	/* First level operation type */
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 635e87e10a38c27127091c244d3ec145d43d7aa7..db681dd2aed205655e77f3dfcd7eb3c33f20277a 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -844,6 +844,12 @@ static void arm_spe__synth_ld_memory_level(const struct arm_spe_record *record,
 	if (arm_spe_is_cache_hit(record->type, L1D)) {
 		data_src->mem_lvl = PERF_MEM_LVL_L1 | PERF_MEM_LVL_HIT;
 		data_src->mem_lvl_num = PERF_MEM_LVLNUM_L1;
+	} else if (record->type & ARM_SPE_RECENTLY_FETCHED) {
+		data_src->mem_lvl = PERF_MEM_LVL_LFB | PERF_MEM_LVL_HIT;
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_LFB;
+	} else if (arm_spe_is_cache_hit(record->type, L2D)) {
+		data_src->mem_lvl = PERF_MEM_LVL_L2 | PERF_MEM_LVL_HIT;
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_L2;
 	} else if (arm_spe_is_cache_hit(record->type, LLC)) {
 		data_src->mem_lvl = PERF_MEM_LVL_L3 | PERF_MEM_LVL_HIT;
 		data_src->mem_lvl_num = PERF_MEM_LVLNUM_L3;
@@ -855,6 +861,9 @@ static void arm_spe__synth_ld_memory_level(const struct arm_spe_record *record,
 	} else if (arm_spe_is_cache_miss(record->type, LLC)) {
 		data_src->mem_lvl = PERF_MEM_LVL_L3 | PERF_MEM_LVL_MISS;
 		data_src->mem_lvl_num = PERF_MEM_LVLNUM_L3;
+	} else if (arm_spe_is_cache_miss(record->type, L2D)) {
+		data_src->mem_lvl = PERF_MEM_LVL_L2 | PERF_MEM_LVL_MISS;
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_L2;
 	} else if (arm_spe_is_cache_miss(record->type, L1D)) {
 		data_src->mem_lvl = PERF_MEM_LVL_L1 | PERF_MEM_LVL_MISS;
 		data_src->mem_lvl_num = PERF_MEM_LVLNUM_L1;
@@ -870,6 +879,11 @@ static void arm_spe__synth_st_memory_level(const struct arm_spe_record *record,
 		data_src->mem_lvl |= arm_spe_is_cache_miss(record->type, LLC) ?
 				     PERF_MEM_LVL_MISS : PERF_MEM_LVL_HIT;
 		data_src->mem_lvl_num = PERF_MEM_LVLNUM_L3;
+	} else if (arm_spe_is_cache_level(record->type, L2D)) {
+		data_src->mem_lvl = PERF_MEM_LVL_L2;
+		data_src->mem_lvl |= arm_spe_is_cache_miss(record->type, L2D) ?
+				     PERF_MEM_LVL_MISS : PERF_MEM_LVL_HIT;
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_L2;
 	} else if (arm_spe_is_cache_level(record->type, L1D)) {
 		data_src->mem_lvl = PERF_MEM_LVL_L1;
 		data_src->mem_lvl |= arm_spe_is_cache_miss(record->type, L1D) ?

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 12/15] perf arm_spe: Improve CPU number retrieving in per-thread mode
  2025-07-31 13:25 [PATCH v4 00/15] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
                   ` (10 preceding siblings ...)
  2025-07-31 13:25 ` [PATCH v4 11/15] perf arm_spe: Fill memory levels for FEAT_SPEv1p4 Leo Yan
@ 2025-07-31 13:25 ` Leo Yan
  2025-07-31 13:25 ` [PATCH v4 13/15] perf arm_spe: Refactor arm_spe__get_metadata_by_cpu() Leo Yan
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Leo Yan @ 2025-07-31 13:25 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, German Gomez, Ali Saidi
  Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
	Leo Yan

In per-thread mode on a homogeneous system, the current code simply
picks the first metadata entry for data source parsing.

This change improves that by using the PMU type to find the matching PMU
event. From there, it reads the CPU map and uses the first CPU ID to
fetch the metadata.

Although this makes no difference when there's only one Arm SPE PMU, it
helps for future support of multiple SPE events.

Signed-off-by: Leo Yan <leo.yan@arm.com>
---
 tools/perf/util/arm-spe.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index db681dd2aed205655e77f3dfcd7eb3c33f20277a..2ac10b8008527a066c9b2f67a22eb8511af9239a 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -914,6 +914,9 @@ static bool arm_spe__synth_ds(struct arm_spe_queue *speq,
 			      union perf_mem_data_src *data_src)
 {
 	struct arm_spe *spe = speq->spe;
+	struct perf_cpu_map *cpus;
+	struct perf_cpu perf_cpu;
+	int16_t cpu_nr;
 	u64 *metadata = NULL;
 	u64 midr;
 	unsigned int i;
@@ -935,13 +938,21 @@ static bool arm_spe__synth_ds(struct arm_spe_queue *speq,
 			if (!spe->is_homogeneous)
 				return false;
 
-			/* In homogeneous system, simply use CPU0's metadata */
-			if (spe->metadata)
-				metadata = spe->metadata[0];
+			cpus = perf_pmus__find_by_type(spe->pmu_type)->cpus;
+			if (!cpus)
+				return false;
+
+			/* In a homogeneous system, fetch the first CPU in the map. */
+			perf_cpu = perf_cpu_map__cpu(cpus, 0);
+			if (perf_cpu.cpu == -1)
+				return false;
+
+			cpu_nr = perf_cpu.cpu;
 		} else {
-			metadata = arm_spe__get_metadata_by_cpu(spe, speq->cpu);
+			cpu_nr = speq->cpu;
 		}
 
+		metadata = arm_spe__get_metadata_by_cpu(spe, cpu_nr);
 		if (!metadata)
 			return false;
 

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 13/15] perf arm_spe: Refactor arm_spe__get_metadata_by_cpu()
  2025-07-31 13:25 [PATCH v4 00/15] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
                   ` (11 preceding siblings ...)
  2025-07-31 13:25 ` [PATCH v4 12/15] perf arm_spe: Improve CPU number retrieving in per-thread mode Leo Yan
@ 2025-07-31 13:25 ` Leo Yan
  2025-07-31 13:25 ` [PATCH v4 14/15] perf arm_spe: Set HITM flag Leo Yan
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Leo Yan @ 2025-07-31 13:25 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, German Gomez, Ali Saidi
  Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
	Leo Yan

Handle "CPU=-1" (per-thread mode) in the arm_spe__get_metadata_by_cpu()
function. As a result, the function is more general and will be invoked
by a sequential change.

Signed-off-by: Leo Yan <leo.yan@arm.com>
---
 tools/perf/util/arm-spe.c | 55 ++++++++++++++++++++++-------------------------
 1 file changed, 26 insertions(+), 29 deletions(-)

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 2ac10b8008527a066c9b2f67a22eb8511af9239a..3c32b9bc9c983370e37d871adeaa610bf5905c1d 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -317,15 +317,38 @@ static int arm_spe_set_tid(struct arm_spe_queue *speq, pid_t tid)
 	return 0;
 }
 
-static u64 *arm_spe__get_metadata_by_cpu(struct arm_spe *spe, u64 cpu)
+static u64 *arm_spe__get_metadata_by_cpu(struct arm_spe *spe, int cpu)
 {
+	struct perf_cpu_map *cpus;
+	struct perf_cpu perf_cpu;
 	u64 i;
 
 	if (!spe->metadata)
 		return NULL;
 
+	/* CPU ID is -1 for per-thread mode */
+	if (cpu < 0) {
+		/*
+		 * On the heterogeneous system, due to CPU ID is -1,
+		 * cannot confirm the data source packet is supported.
+		 */
+		if (!spe->is_homogeneous)
+			return NULL;
+
+		cpus = perf_pmus__find_by_type(spe->pmu_type)->cpus;
+		if (!cpus)
+			return NULL;
+
+		/* In a homogeneous system, fetch the first CPU in the map. */
+		perf_cpu = perf_cpu_map__cpu(cpus, 0);
+		if (perf_cpu.cpu == -1)
+			return NULL;
+
+		cpu = perf_cpu.cpu;
+	}
+
 	for (i = 0; i < spe->metadata_nr_cpu; i++)
-		if (spe->metadata[i][ARM_SPE_CPU] == cpu)
+		if (spe->metadata[i][ARM_SPE_CPU] == (u64)cpu)
 			return spe->metadata[i];
 
 	return NULL;
@@ -914,9 +937,6 @@ static bool arm_spe__synth_ds(struct arm_spe_queue *speq,
 			      union perf_mem_data_src *data_src)
 {
 	struct arm_spe *spe = speq->spe;
-	struct perf_cpu_map *cpus;
-	struct perf_cpu perf_cpu;
-	int16_t cpu_nr;
 	u64 *metadata = NULL;
 	u64 midr;
 	unsigned int i;
@@ -929,30 +949,7 @@ static bool arm_spe__synth_ds(struct arm_spe_queue *speq,
 		cpuid = perf_env__cpuid(perf_session__env(spe->session));
 		midr = strtol(cpuid, NULL, 16);
 	} else {
-		/* CPU ID is -1 for per-thread mode */
-		if (speq->cpu < 0) {
-			/*
-			 * On the heterogeneous system, due to CPU ID is -1,
-			 * cannot confirm the data source packet is supported.
-			 */
-			if (!spe->is_homogeneous)
-				return false;
-
-			cpus = perf_pmus__find_by_type(spe->pmu_type)->cpus;
-			if (!cpus)
-				return false;
-
-			/* In a homogeneous system, fetch the first CPU in the map. */
-			perf_cpu = perf_cpu_map__cpu(cpus, 0);
-			if (perf_cpu.cpu == -1)
-				return false;
-
-			cpu_nr = perf_cpu.cpu;
-		} else {
-			cpu_nr = speq->cpu;
-		}
-
-		metadata = arm_spe__get_metadata_by_cpu(spe, cpu_nr);
+		metadata = arm_spe__get_metadata_by_cpu(spe, speq->cpu);
 		if (!metadata)
 			return false;
 

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 14/15] perf arm_spe: Set HITM flag
  2025-07-31 13:25 [PATCH v4 00/15] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
                   ` (12 preceding siblings ...)
  2025-07-31 13:25 ` [PATCH v4 13/15] perf arm_spe: Refactor arm_spe__get_metadata_by_cpu() Leo Yan
@ 2025-07-31 13:25 ` Leo Yan
  2025-07-31 13:25 ` [PATCH v4 15/15] perf arm_spe: Allow parsing both data source and events Leo Yan
  2025-08-29  8:00 ` [PATCH v4 00/15] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
  15 siblings, 0 replies; 17+ messages in thread
From: Leo Yan @ 2025-07-31 13:25 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, German Gomez, Ali Saidi
  Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
	Leo Yan

Since FEAT_SPEv1p4, Arm SPE provides extra two events "Cache data
modified" and "Data snooped".

Set the snoop mode as:

- If both the "Cache data modified" event and the "Data snooped" event
  are set, which indicates a load operation that snooped from a outside
  cache and hit a modified copy, set the HITM flag to inspect false
  sharing.
- If the snooped event bit is not set, and the snooped event has been
  supported by the hardware, set as NONE mode (no snoop operation).
- If the snooped event bit is not set, and the event is not supported or
  absent the events info in the meta data, set as NA mode (not
  available).

Don't set any mode for only "Cache data modified" event, as it hits a
local modified copy.

Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h |  2 ++
 tools/perf/util/arm-spe.c                         | 26 +++++++++++++++++++++--
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
index 3afa8703b21db9d231eef93fe981e0c20d562e83..fbb57f8052371e51d562d9dd6098e97fc099461c 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
@@ -28,6 +28,8 @@
 #define ARM_SPE_L2D_ACCESS		BIT(EV_L2D_ACCESS)
 #define ARM_SPE_L2D_MISS		BIT(EV_L2D_MISS)
 #define ARM_SPE_RECENTLY_FETCHED	BIT(EV_RECENTLY_FETCHED)
+#define ARM_SPE_DATA_SNOOPED		BIT(EV_DATA_SNOOPED)
+#define ARM_SPE_HITM			BIT(EV_CACHE_DATA_MODIFIED)
 
 enum arm_spe_op_type {
 	/* First level operation type */
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 3c32b9bc9c983370e37d871adeaa610bf5905c1d..17d12bfa812c86a796dd79624a3d283f0cbac237 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -915,9 +915,12 @@ static void arm_spe__synth_st_memory_level(const struct arm_spe_record *record,
 	}
 }
 
-static void arm_spe__synth_memory_level(const struct arm_spe_record *record,
+static void arm_spe__synth_memory_level(struct arm_spe_queue *speq,
+					const struct arm_spe_record *record,
 					union perf_mem_data_src *data_src)
 {
+	struct arm_spe *spe = speq->spe;
+
 	if (data_src->mem_op == PERF_MEM_OP_LOAD)
 		arm_spe__synth_ld_memory_level(record, data_src);
 	if (data_src->mem_op == PERF_MEM_OP_STORE)
@@ -928,6 +931,25 @@ static void arm_spe__synth_memory_level(const struct arm_spe_record *record,
 		data_src->mem_lvl_num = PERF_MEM_LVLNUM_NA;
 	}
 
+	if (record->type & ARM_SPE_DATA_SNOOPED) {
+		if (record->type & ARM_SPE_HITM)
+			data_src->mem_snoop = PERF_MEM_SNOOP_HITM;
+		else
+			data_src->mem_snoop = PERF_MEM_SNOOP_HIT;
+	} else {
+		u64 *metadata = arm_spe__get_metadata_by_cpu(spe, speq->cpu);
+
+		/*
+		 * Set NA ("Not available") mode if no meta data or the
+		 * SNOOPED event is not supported.
+		 */
+		if (!metadata ||
+		    !(metadata[ARM_SPE_CAP_EVENT_FILTER] & ARM_SPE_DATA_SNOOPED))
+			data_src->mem_snoop = PERF_MEM_SNOOP_NA;
+		else
+			data_src->mem_snoop = PERF_MEM_SNOOP_NONE;
+	}
+
 	if (record->type & ARM_SPE_REMOTE_ACCESS)
 		data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
 }
@@ -984,7 +1006,7 @@ arm_spe__synth_data_source(struct arm_spe_queue *speq,
 		return data_src;
 
 	if (!arm_spe__synth_ds(speq, record, &data_src))
-		arm_spe__synth_memory_level(record, &data_src);
+		arm_spe__synth_memory_level(speq, record, &data_src);
 
 	if (record->type & (ARM_SPE_TLB_ACCESS | ARM_SPE_TLB_MISS)) {
 		data_src.mem_dtlb = PERF_MEM_TLB_WK;

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 15/15] perf arm_spe: Allow parsing both data source and events
  2025-07-31 13:25 [PATCH v4 00/15] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
                   ` (13 preceding siblings ...)
  2025-07-31 13:25 ` [PATCH v4 14/15] perf arm_spe: Set HITM flag Leo Yan
@ 2025-07-31 13:25 ` Leo Yan
  2025-08-29  8:00 ` [PATCH v4 00/15] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
  15 siblings, 0 replies; 17+ messages in thread
From: Leo Yan @ 2025-07-31 13:25 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, German Gomez, Ali Saidi
  Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
	Leo Yan

Current code skips to parse events after generating data source. The
reason is the data source packets have cache and snooping related info,
the afterwards event packets might contain duplicate info.

This commit changes to continue parsing the events after data source
analysis. If data source does not give out memory level and snooping
types, then the event info is used to synthesize the related fields.

As a result, both the peer snoop option ('-d peer') and hitm options
('-d tot/lcl/rmt') are supported by Arm SPE in the perf c2c.

Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
 tools/perf/util/arm-spe.c | 75 ++++++++++++++++++++++++++++-------------------
 1 file changed, 45 insertions(+), 30 deletions(-)

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 17d12bfa812c86a796dd79624a3d283f0cbac237..a2925a5cfe980f1c72a8748b02fbe238cc7b8e00 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -921,40 +921,56 @@ static void arm_spe__synth_memory_level(struct arm_spe_queue *speq,
 {
 	struct arm_spe *spe = speq->spe;
 
-	if (data_src->mem_op == PERF_MEM_OP_LOAD)
-		arm_spe__synth_ld_memory_level(record, data_src);
-	if (data_src->mem_op == PERF_MEM_OP_STORE)
-		arm_spe__synth_st_memory_level(record, data_src);
+	/*
+	 * The data source packet contains more info for cache levels for
+	 * peer snooping. So respect the memory level if has been set by
+	 * data source parsing.
+	 */
+	if (!data_src->mem_lvl) {
+		if (data_src->mem_op == PERF_MEM_OP_LOAD)
+			arm_spe__synth_ld_memory_level(record, data_src);
+		if (data_src->mem_op == PERF_MEM_OP_STORE)
+			arm_spe__synth_st_memory_level(record, data_src);
+	}
 
 	if (!data_src->mem_lvl) {
 		data_src->mem_lvl = PERF_MEM_LVL_NA;
 		data_src->mem_lvl_num = PERF_MEM_LVLNUM_NA;
 	}
 
-	if (record->type & ARM_SPE_DATA_SNOOPED) {
-		if (record->type & ARM_SPE_HITM)
-			data_src->mem_snoop = PERF_MEM_SNOOP_HITM;
-		else
-			data_src->mem_snoop = PERF_MEM_SNOOP_HIT;
-	} else {
-		u64 *metadata = arm_spe__get_metadata_by_cpu(spe, speq->cpu);
-
-		/*
-		 * Set NA ("Not available") mode if no meta data or the
-		 * SNOOPED event is not supported.
-		 */
-		if (!metadata ||
-		    !(metadata[ARM_SPE_CAP_EVENT_FILTER] & ARM_SPE_DATA_SNOOPED))
-			data_src->mem_snoop = PERF_MEM_SNOOP_NA;
-		else
-			data_src->mem_snoop = PERF_MEM_SNOOP_NONE;
+	/*
+	 * If 'mem_snoop' has been set by data source packet, skip to set
+	 * it at here.
+	 */
+	if (!data_src->mem_snoop) {
+		if (record->type & ARM_SPE_DATA_SNOOPED) {
+			if (record->type & ARM_SPE_HITM)
+				data_src->mem_snoop = PERF_MEM_SNOOP_HITM;
+			else
+				data_src->mem_snoop = PERF_MEM_SNOOP_HIT;
+		} else {
+			u64 *metadata =
+				arm_spe__get_metadata_by_cpu(spe, speq->cpu);
+
+			/*
+			 * Set NA ("Not available") mode if no meta data or the
+			 * SNOOPED event is not supported.
+			 */
+			if (!metadata ||
+			    !(metadata[ARM_SPE_CAP_EVENT_FILTER] & ARM_SPE_DATA_SNOOPED))
+				data_src->mem_snoop = PERF_MEM_SNOOP_NA;
+			else
+				data_src->mem_snoop = PERF_MEM_SNOOP_NONE;
+		}
 	}
 
-	if (record->type & ARM_SPE_REMOTE_ACCESS)
-		data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
+	if (!data_src->mem_remote) {
+		if (record->type & ARM_SPE_REMOTE_ACCESS)
+			data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
+	}
 }
 
-static bool arm_spe__synth_ds(struct arm_spe_queue *speq,
+static void arm_spe__synth_ds(struct arm_spe_queue *speq,
 			      const struct arm_spe_record *record,
 			      union perf_mem_data_src *data_src)
 {
@@ -973,19 +989,18 @@ static bool arm_spe__synth_ds(struct arm_spe_queue *speq,
 	} else {
 		metadata = arm_spe__get_metadata_by_cpu(spe, speq->cpu);
 		if (!metadata)
-			return false;
+			return;
 
 		midr = metadata[ARM_SPE_CPU_MIDR];
 	}
 
 	for (i = 0; i < ARRAY_SIZE(data_source_handles); i++) {
 		if (is_midr_in_range_list(midr, data_source_handles[i].midr_ranges)) {
-			data_source_handles[i].ds_synth(record, data_src);
-			return true;
+			return data_source_handles[i].ds_synth(record, data_src);
 		}
 	}
 
-	return false;
+	return;
 }
 
 static union perf_mem_data_src
@@ -1005,8 +1020,8 @@ arm_spe__synth_data_source(struct arm_spe_queue *speq,
 	else
 		return data_src;
 
-	if (!arm_spe__synth_ds(speq, record, &data_src))
-		arm_spe__synth_memory_level(speq, record, &data_src);
+	arm_spe__synth_ds(speq, record, &data_src);
+	arm_spe__synth_memory_level(speq, record, &data_src);
 
 	if (record->type & (ARM_SPE_TLB_ACCESS | ARM_SPE_TLB_MISS)) {
 		data_src.mem_dtlb = PERF_MEM_TLB_WK;

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 00/15] perf arm-spe: Support new events in FEAT_SPEv1p4
  2025-07-31 13:25 [PATCH v4 00/15] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
                   ` (14 preceding siblings ...)
  2025-07-31 13:25 ` [PATCH v4 15/15] perf arm_spe: Allow parsing both data source and events Leo Yan
@ 2025-08-29  8:00 ` Leo Yan
  15 siblings, 0 replies; 17+ messages in thread
From: Leo Yan @ 2025-08-29  8:00 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, German Gomez, Ali Saidi
  Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users

Hi Will,

On Thu, Jul 31, 2025 at 02:25:35PM +0100, Leo Yan wrote:
> This series adds support for new event types introduced in Arm SPE v1.4.
> 
> The first two patches modify the SPE driver to expose 'event_filter'
> entry in SysFS caps folder. These patches are part of James' series
> "[PATCH v5 00/12] perf: arm_spe: Armv8.8 SPE features" [1]. In case this
> series will be merged independently of James' series, these patches have
> have been included here.

Could you review the first two patches in this series for Arm SPE driver
change?

I verified this series can be applied cleanly on the Linux master
branch with the latest commit:

  07d9df80082b ("Merge tag 'perf-tools-fixes-for-v6.17-2025-08-27' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools")

Thanks,
Leo


> [1] https://lore.kernel.org/linux-arm-kernel/20250721-james-perf-feat_spe_eft-v5-0-a7bc533485a1@linaro.org/

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2025-08-29  8:00 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-31 13:25 [PATCH v4 00/15] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
2025-07-31 13:25 ` [PATCH v4 01/15] perf: arm_spe: Support FEAT_SPEv1p4 filters Leo Yan
2025-07-31 13:25 ` [PATCH v4 02/15] perf: arm_spe: Expose event filter Leo Yan
2025-07-31 13:25 ` [PATCH v4 03/15] perf arm_spe: Correct setting remote access Leo Yan
2025-07-31 13:25 ` [PATCH v4 04/15] perf arm_spe: Correct memory level for " Leo Yan
2025-07-31 13:25 ` [PATCH v4 05/15] perf arm_spe: Use full type for data_src Leo Yan
2025-07-31 13:25 ` [PATCH v4 06/15] perf arm_spe: Directly propagate raw event Leo Yan
2025-07-31 13:25 ` [PATCH v4 07/15] perf arm_spe: Decode event types for new features Leo Yan
2025-07-31 13:25 ` [PATCH v4 08/15] perf arm_spe: Add "event_filter" entry in meta data Leo Yan
2025-07-31 13:25 ` [PATCH v4 09/15] perf arm_spe: Refine memory level filling Leo Yan
2025-07-31 13:25 ` [PATCH v4 10/15] perf arm_spe: Separate setting of memory levels for loads and stores Leo Yan
2025-07-31 13:25 ` [PATCH v4 11/15] perf arm_spe: Fill memory levels for FEAT_SPEv1p4 Leo Yan
2025-07-31 13:25 ` [PATCH v4 12/15] perf arm_spe: Improve CPU number retrieving in per-thread mode Leo Yan
2025-07-31 13:25 ` [PATCH v4 13/15] perf arm_spe: Refactor arm_spe__get_metadata_by_cpu() Leo Yan
2025-07-31 13:25 ` [PATCH v4 14/15] perf arm_spe: Set HITM flag Leo Yan
2025-07-31 13:25 ` [PATCH v4 15/15] perf arm_spe: Allow parsing both data source and events Leo Yan
2025-08-29  8:00 ` [PATCH v4 00/15] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).