* [PATCH 00/12] perf arm-spe: Support new events in FEAT_SPEv1p4
@ 2025-06-13 15:53 Leo Yan
2025-06-13 15:53 ` [PATCH 01/12] drivers/perf: arm_spe: Store event reserved bits in driver data Leo Yan
` (11 more replies)
0 siblings, 12 replies; 28+ messages in thread
From: Leo Yan @ 2025-06-13 15:53 UTC (permalink / raw)
To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter
Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
Leo Yan
This series adds support for new event types introduced in Arm SPE v1.4.
The first two patches modify the Arm SPE driver to expose 'events' entry
in SysFS caps folder. This allows users to discover which events are
supported by the hardware.
Starting from patch 03, changes are made to the perf tool:
Patch 03 is a fixing for setting remote bit.
Patch 04 refactors the code to avoid duplicate definitions of event
bits.
Patch 05 dumps new event bits in raw format via the 'perf script -D'
command.
Patches 06 to 11 enhance memory-level information based on the new
events introduced in FEAT_SPEv1p4.
Patch 12 changes the logic to parse events after data source analysis.
The event information complements the data source and provides a more
complete view. As a result, Arm SPE can now support both HITM and peer
modes (See the "--display" options in perf c2c).
This series has been tested on FVP RevC platform.
Note: for a local HITM event, the emulation does not provide any info
for LLC. However, the perf c2c tool relies on LLC + HITM for accounting
local HITM. I to manually set the LLC HIT flag to verify the
"perf c2c -d tot" command.
---
Leo Yan (12):
drivers/perf: arm_spe: Store event reserved bits in driver data
drivers/perf: arm_spe: Expose events capability
perf arm_spe: Correct setting remote access
perf arm_spe: Directly propagate raw event
perf arm_spe: Decode event types for new features
perf arm_spe: Add "events" entry in meta data
perf arm_spe: Refine memory level filling
perf arm_spe: Separate setting of memory levels for loads and stores
perf arm_spe: Fill memory levels for FEAT_SPEv1p4
perf arm_spe: Refactor arm_spe__get_metadata_by_cpu()
perf arm_spe: Set HITM flag
perf arm_spe: Allow parsing both data source and events
drivers/perf/arm_spe_pmu.c | 15 +-
tools/perf/arch/arm64/util/arm-spe.c | 5 +
tools/perf/util/arm-spe-decoder/arm-spe-decoder.c | 37 +----
tools/perf/util/arm-spe-decoder/arm-spe-decoder.h | 33 ++--
.../util/arm-spe-decoder/arm-spe-pkt-decoder.c | 14 ++
.../util/arm-spe-decoder/arm-spe-pkt-decoder.h | 7 +
tools/perf/util/arm-spe.c | 173 ++++++++++++++++-----
tools/perf/util/arm-spe.h | 2 +
8 files changed, 194 insertions(+), 92 deletions(-)
---
base-commit: 27605c8c0f69e319df156b471974e4e223035378
change-id: 20250610-arm_spe_support_hitm_overhead_v1_public-c4a263385434
Best regards,
--
Leo Yan <leo.yan@arm.com>
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 01/12] drivers/perf: arm_spe: Store event reserved bits in driver data
2025-06-13 15:53 [PATCH 00/12] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
@ 2025-06-13 15:53 ` Leo Yan
2025-06-19 11:28 ` James Clark
2025-06-13 15:53 ` [PATCH 02/12] drivers/perf: arm_spe: Expose events capability Leo Yan
` (10 subsequent siblings)
11 siblings, 1 reply; 28+ messages in thread
From: Leo Yan @ 2025-06-13 15:53 UTC (permalink / raw)
To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter
Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
Leo Yan
Store the reserved event bits in the driver structure. This cached value
avoids redundant calls to arm_spe_pmsevfr_res0() each time a profiling
session is created.
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
drivers/perf/arm_spe_pmu.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index 3efed8839a4ec5604eba242cb620327cd2a6a87d..be2ed326bb794d7e5dd1d6cfa89330753ced3ca5 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -77,6 +77,7 @@ struct arm_spe_pmu {
u16 pmsver;
u16 min_period;
u16 counter_sz;
+ u64 pmsevfr_res0;
#define SPE_PMU_FEAT_FILT_EVT (1UL << 0)
#define SPE_PMU_FEAT_FILT_TYP (1UL << 1)
@@ -722,10 +723,10 @@ static int arm_spe_pmu_event_init(struct perf_event *event)
!cpumask_test_cpu(event->cpu, &spe_pmu->supported_cpus))
return -ENOENT;
- if (arm_spe_event_to_pmsevfr(event) & arm_spe_pmsevfr_res0(spe_pmu->pmsver))
+ if (arm_spe_event_to_pmsevfr(event) & spe_pmu->pmsevfr_res0)
return -EOPNOTSUPP;
- if (arm_spe_event_to_pmsnevfr(event) & arm_spe_pmsevfr_res0(spe_pmu->pmsver))
+ if (arm_spe_event_to_pmsnevfr(event) & spe_pmu->pmsevfr_res0)
return -EOPNOTSUPP;
if (attr->exclude_idle)
@@ -1103,6 +1104,8 @@ static void __arm_spe_pmu_dev_probe(void *info)
spe_pmu->counter_sz = 16;
}
+ spe_pmu->pmsevfr_res0 = arm_spe_pmsevfr_res0(spe_pmu->pmsver);
+
dev_info(dev,
"probed SPEv1.%d for CPUs %*pbl [max_record_sz %u, align %u, features 0x%llx]\n",
spe_pmu->pmsver - 1, cpumask_pr_args(&spe_pmu->supported_cpus),
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 02/12] drivers/perf: arm_spe: Expose events capability
2025-06-13 15:53 [PATCH 00/12] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
2025-06-13 15:53 ` [PATCH 01/12] drivers/perf: arm_spe: Store event reserved bits in driver data Leo Yan
@ 2025-06-13 15:53 ` Leo Yan
2025-06-19 11:32 ` James Clark
2025-06-13 15:53 ` [PATCH 03/12] perf arm_spe: Correct setting remote access Leo Yan
` (9 subsequent siblings)
11 siblings, 1 reply; 28+ messages in thread
From: Leo Yan @ 2025-06-13 15:53 UTC (permalink / raw)
To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter
Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
Leo Yan
Expose an events entry in the caps folder to inform user space which
events are valid.
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
drivers/perf/arm_spe_pmu.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index be2ed326bb794d7e5dd1d6cfa89330753ced3ca5..b59c394b715bc49af0ae30c521f97813a046755d 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -116,6 +116,7 @@ enum arm_spe_pmu_capabilities {
SPE_PMU_CAP_FEAT_MAX,
SPE_PMU_CAP_CNT_SZ = SPE_PMU_CAP_FEAT_MAX,
SPE_PMU_CAP_MIN_IVAL,
+ SPE_PMU_CAP_EVENTS,
};
static int arm_spe_pmu_feat_caps[SPE_PMU_CAP_FEAT_MAX] = {
@@ -123,7 +124,7 @@ static int arm_spe_pmu_feat_caps[SPE_PMU_CAP_FEAT_MAX] = {
[SPE_PMU_CAP_ERND] = SPE_PMU_FEAT_ERND,
};
-static u32 arm_spe_pmu_cap_get(struct arm_spe_pmu *spe_pmu, int cap)
+static u64 arm_spe_pmu_cap_get(struct arm_spe_pmu *spe_pmu, int cap)
{
if (cap < SPE_PMU_CAP_FEAT_MAX)
return !!(spe_pmu->features & arm_spe_pmu_feat_caps[cap]);
@@ -133,6 +134,8 @@ static u32 arm_spe_pmu_cap_get(struct arm_spe_pmu *spe_pmu, int cap)
return spe_pmu->counter_sz;
case SPE_PMU_CAP_MIN_IVAL:
return spe_pmu->min_period;
+ case SPE_PMU_CAP_EVENTS:
+ return ~spe_pmu->pmsevfr_res0;
default:
WARN(1, "unknown cap %d\n", cap);
}
@@ -149,7 +152,7 @@ static ssize_t arm_spe_pmu_cap_show(struct device *dev,
container_of(attr, struct dev_ext_attribute, attr);
int cap = (long)ea->var;
- return sysfs_emit(buf, "%u\n", arm_spe_pmu_cap_get(spe_pmu, cap));
+ return sysfs_emit(buf, "%llu\n", arm_spe_pmu_cap_get(spe_pmu, cap));
}
#define SPE_EXT_ATTR_ENTRY(_name, _func, _var) \
@@ -165,6 +168,7 @@ static struct attribute *arm_spe_pmu_cap_attr[] = {
SPE_CAP_EXT_ATTR_ENTRY(ernd, SPE_PMU_CAP_ERND),
SPE_CAP_EXT_ATTR_ENTRY(count_size, SPE_PMU_CAP_CNT_SZ),
SPE_CAP_EXT_ATTR_ENTRY(min_interval, SPE_PMU_CAP_MIN_IVAL),
+ SPE_CAP_EXT_ATTR_ENTRY(events, SPE_PMU_CAP_EVENTS),
NULL,
};
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 03/12] perf arm_spe: Correct setting remote access
2025-06-13 15:53 [PATCH 00/12] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
2025-06-13 15:53 ` [PATCH 01/12] drivers/perf: arm_spe: Store event reserved bits in driver data Leo Yan
2025-06-13 15:53 ` [PATCH 02/12] drivers/perf: arm_spe: Expose events capability Leo Yan
@ 2025-06-13 15:53 ` Leo Yan
2025-06-19 13:53 ` James Clark
2025-06-13 15:53 ` [PATCH 04/12] perf arm_spe: Directly propagate raw event Leo Yan
` (8 subsequent siblings)
11 siblings, 1 reply; 28+ messages in thread
From: Leo Yan @ 2025-06-13 15:53 UTC (permalink / raw)
To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter
Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
Leo Yan
Set the mem_remote field for a remote access to appropriately represent
the event.
Fixes: a89dbc9b988f ("perf arm-spe: Set sample's data source field")
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
tools/perf/util/arm-spe.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index d46e0cccac99a36148b4daa37f2bf2342e6b47ef..fdef6f743cf3c76b1dcdd57f5a2c297642fdd21a 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -839,7 +839,7 @@ static void arm_spe__synth_memory_level(const struct arm_spe_record *record,
}
if (record->type & ARM_SPE_REMOTE_ACCESS)
- data_src->mem_lvl |= PERF_MEM_LVL_REM_CCE1;
+ data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
}
static bool arm_spe__synth_ds(struct arm_spe_queue *speq,
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 04/12] perf arm_spe: Directly propagate raw event
2025-06-13 15:53 [PATCH 00/12] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
` (2 preceding siblings ...)
2025-06-13 15:53 ` [PATCH 03/12] perf arm_spe: Correct setting remote access Leo Yan
@ 2025-06-13 15:53 ` Leo Yan
2025-06-19 14:13 ` James Clark
2025-06-13 15:53 ` [PATCH 05/12] perf arm_spe: Decode event types for new features Leo Yan
` (7 subsequent siblings)
11 siblings, 1 reply; 28+ messages in thread
From: Leo Yan @ 2025-06-13 15:53 UTC (permalink / raw)
To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter
Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
Leo Yan
Two separate sets of event bits are defined: one for used in the code
for generating samples and another for the backend decoder. Reduce
the redundancy by using the raw event bits directly in the frontend
code.
To avoid overflow issues, change the type of the event variable from
enum to u64.
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
tools/perf/util/arm-spe-decoder/arm-spe-decoder.c | 37 +----------------------
tools/perf/util/arm-spe-decoder/arm-spe-decoder.h | 28 ++++++++---------
2 files changed, 14 insertions(+), 51 deletions(-)
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
index 688fe6d7524416420a7c18d5f8a268492ce7c3b8..96eb7cced6fd1574f5d823e4c67b9051dcf183ed 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
@@ -229,42 +229,7 @@ static int arm_spe_read_record(struct arm_spe_decoder *decoder)
}
break;
case ARM_SPE_EVENTS:
- if (payload & BIT(EV_L1D_REFILL))
- decoder->record.type |= ARM_SPE_L1D_MISS;
-
- if (payload & BIT(EV_L1D_ACCESS))
- decoder->record.type |= ARM_SPE_L1D_ACCESS;
-
- if (payload & BIT(EV_TLB_WALK))
- decoder->record.type |= ARM_SPE_TLB_MISS;
-
- if (payload & BIT(EV_TLB_ACCESS))
- decoder->record.type |= ARM_SPE_TLB_ACCESS;
-
- if (payload & BIT(EV_LLC_MISS))
- decoder->record.type |= ARM_SPE_LLC_MISS;
-
- if (payload & BIT(EV_LLC_ACCESS))
- decoder->record.type |= ARM_SPE_LLC_ACCESS;
-
- if (payload & BIT(EV_REMOTE_ACCESS))
- decoder->record.type |= ARM_SPE_REMOTE_ACCESS;
-
- if (payload & BIT(EV_MISPRED))
- decoder->record.type |= ARM_SPE_BRANCH_MISS;
-
- if (payload & BIT(EV_NOT_TAKEN))
- decoder->record.type |= ARM_SPE_BRANCH_NOT_TAKEN;
-
- if (payload & BIT(EV_TRANSACTIONAL))
- decoder->record.type |= ARM_SPE_IN_TXN;
-
- if (payload & BIT(EV_PARTIAL_PREDICATE))
- decoder->record.type |= ARM_SPE_SVE_PARTIAL_PRED;
-
- if (payload & BIT(EV_EMPTY_PREDICATE))
- decoder->record.type |= ARM_SPE_SVE_EMPTY_PRED;
-
+ decoder->record.type = payload;
break;
case ARM_SPE_DATA_SOURCE:
decoder->record.source = payload;
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
index 881d9f29c1380b62486f0cd81498750ba06c4b50..03da55453da8fd2e7b9e2dcba3ddcf5243599e1c 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
@@ -13,20 +13,18 @@
#include "arm-spe-pkt-decoder.h"
-enum arm_spe_sample_type {
- ARM_SPE_L1D_ACCESS = 1 << 0,
- ARM_SPE_L1D_MISS = 1 << 1,
- ARM_SPE_LLC_ACCESS = 1 << 2,
- ARM_SPE_LLC_MISS = 1 << 3,
- ARM_SPE_TLB_ACCESS = 1 << 4,
- ARM_SPE_TLB_MISS = 1 << 5,
- ARM_SPE_BRANCH_MISS = 1 << 6,
- ARM_SPE_REMOTE_ACCESS = 1 << 7,
- ARM_SPE_SVE_PARTIAL_PRED = 1 << 8,
- ARM_SPE_SVE_EMPTY_PRED = 1 << 9,
- ARM_SPE_BRANCH_NOT_TAKEN = 1 << 10,
- ARM_SPE_IN_TXN = 1 << 11,
-};
+#define ARM_SPE_L1D_ACCESS BIT(EV_L1D_ACCESS)
+#define ARM_SPE_L1D_MISS BIT(EV_L1D_REFILL)
+#define ARM_SPE_LLC_ACCESS BIT(EV_LLC_ACCESS)
+#define ARM_SPE_LLC_MISS BIT(EV_LLC_MISS)
+#define ARM_SPE_TLB_ACCESS BIT(EV_TLB_ACCESS)
+#define ARM_SPE_TLB_MISS BIT(EV_TLB_WALK)
+#define ARM_SPE_BRANCH_MISS BIT(EV_MISPRED)
+#define ARM_SPE_BRANCH_NOT_TAKEN BIT(EV_NOT_TAKEN)
+#define ARM_SPE_REMOTE_ACCESS BIT(EV_REMOTE_ACCESS)
+#define ARM_SPE_SVE_PARTIAL_PRED BIT(EV_PARTIAL_PREDICATE)
+#define ARM_SPE_SVE_EMPTY_PRED BIT(EV_EMPTY_PREDICATE)
+#define ARM_SPE_IN_TXN BIT(EV_TRANSACTIONAL)
enum arm_spe_op_type {
/* First level operation type */
@@ -100,7 +98,7 @@ enum arm_spe_hisi_hip_data_source {
};
struct arm_spe_record {
- enum arm_spe_sample_type type;
+ u64 type;
int err;
u32 op;
u32 latency;
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 05/12] perf arm_spe: Decode event types for new features
2025-06-13 15:53 [PATCH 00/12] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
` (3 preceding siblings ...)
2025-06-13 15:53 ` [PATCH 04/12] perf arm_spe: Directly propagate raw event Leo Yan
@ 2025-06-13 15:53 ` Leo Yan
2025-06-19 14:20 ` James Clark
2025-06-13 15:53 ` [PATCH 06/12] perf arm_spe: Add "events" entry in meta data Leo Yan
` (6 subsequent siblings)
11 siblings, 1 reply; 28+ messages in thread
From: Leo Yan @ 2025-06-13 15:53 UTC (permalink / raw)
To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter
Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
Leo Yan
Decode new event types introduced by FEAT_SPEv1p4, FEAT_SPE_SME and
FEAT_SPE_SME.
The printed event names don't strictly follow the naming in the Arm ARM.
For example, the "Cache data modified" event is shown as "HITM", and the
"Data snooped" event is printed as "SNOOPED". Shorter names are easier
to read and review while preserving core meanings.
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c | 14 ++++++++++++++
tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h | 7 +++++++
2 files changed, 21 insertions(+)
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
index 13cadb2f1ceac7a90e359c4d6aa1d5fc5169e142..80561630253dd5c46f7e99b24fc13b99f346459f 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
@@ -314,6 +314,20 @@ static int arm_spe_pkt_desc_event(const struct arm_spe_pkt *packet,
arm_spe_pkt_out_string(&err, &buf, &buf_len, " SVE-PARTIAL-PRED");
if (payload & BIT(EV_EMPTY_PREDICATE))
arm_spe_pkt_out_string(&err, &buf, &buf_len, " SVE-EMPTY-PRED");
+ if (payload & BIT(EV_L2D_ACCESS))
+ arm_spe_pkt_out_string(&err, &buf, &buf_len, " L2D-ACCESS");
+ if (payload & BIT(EV_L2D_MISS))
+ arm_spe_pkt_out_string(&err, &buf, &buf_len, " L2D-MISS");
+ if (payload & BIT(EV_CACHE_DATA_MODIFIED))
+ arm_spe_pkt_out_string(&err, &buf, &buf_len, " HITM");
+ if (payload & BIT(EV_RECENTLY_FETCHED))
+ arm_spe_pkt_out_string(&err, &buf, &buf_len, " LFB");
+ if (payload & BIT(EV_DATA_SNOOPED))
+ arm_spe_pkt_out_string(&err, &buf, &buf_len, " SNOOPED");
+ if (payload & BIT(EV_STREAMING_SVE_MODE))
+ arm_spe_pkt_out_string(&err, &buf, &buf_len, " STREAMING-SVE");
+ if (payload & BIT(EV_SMCU))
+ arm_spe_pkt_out_string(&err, &buf, &buf_len, " SMCU");
return err;
}
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
index 2cdf9f6da2681244291445d54c5b13fe8a2e9d9a..d00c2481712dcc457eab2f5e9848ffc3150e6236 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
@@ -108,6 +108,13 @@ enum arm_spe_events {
EV_TRANSACTIONAL = 16,
EV_PARTIAL_PREDICATE = 17,
EV_EMPTY_PREDICATE = 18,
+ EV_L2D_ACCESS = 19,
+ EV_L2D_MISS = 20,
+ EV_CACHE_DATA_MODIFIED = 21,
+ EV_RECENTLY_FETCHED = 22,
+ EV_DATA_SNOOPED = 23,
+ EV_STREAMING_SVE_MODE = 24,
+ EV_SMCU = 25,
};
/* Operation packet header */
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 06/12] perf arm_spe: Add "events" entry in meta data
2025-06-13 15:53 [PATCH 00/12] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
` (4 preceding siblings ...)
2025-06-13 15:53 ` [PATCH 05/12] perf arm_spe: Decode event types for new features Leo Yan
@ 2025-06-13 15:53 ` Leo Yan
2025-06-19 15:46 ` James Clark
2025-06-13 15:53 ` [PATCH 07/12] perf arm_spe: Refine memory level filling Leo Yan
` (5 subsequent siblings)
11 siblings, 1 reply; 28+ messages in thread
From: Leo Yan @ 2025-06-13 15:53 UTC (permalink / raw)
To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter
Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
Leo Yan
Add a new "events" entry in the meta data and dump it in raw data mode.
After:
# perf script -D
...
0 0 0x470 [0x1f0]: PERF_RECORD_AUXTRACE_INFO type: 4
Header version :2
Header size :4
PMU type v2 :11
CPU number :8
Magic :0x1010101010101010
CPU # :0
Num of params :4
MIDR :0x410fd0f0
PMU Type :11
Min Interval :256
Events :0xffff000003fefffe
...
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
tools/perf/arch/arm64/util/arm-spe.c | 5 +++++
tools/perf/util/arm-spe.c | 1 +
tools/perf/util/arm-spe.h | 2 ++
3 files changed, 8 insertions(+)
diff --git a/tools/perf/arch/arm64/util/arm-spe.c b/tools/perf/arch/arm64/util/arm-spe.c
index 4f2833b62ff55f3fd1dff3f032d6e06528460939..5cce5f29d8c7936cc4f424a09d50536644c17be9 100644
--- a/tools/perf/arch/arm64/util/arm-spe.c
+++ b/tools/perf/arch/arm64/util/arm-spe.c
@@ -121,12 +121,17 @@ static int arm_spe_save_cpu_header(struct auxtrace_record *itr,
/* No Arm SPE PMU is found */
data[ARM_SPE_CPU_PMU_TYPE] = ULLONG_MAX;
data[ARM_SPE_CAP_MIN_IVAL] = 0;
+ data[ARM_SPE_CAP_EVENTS] = 0;
} else {
data[ARM_SPE_CPU_PMU_TYPE] = pmu->type;
if (perf_pmu__scan_file(pmu, "caps/min_interval", "%lu", &val) != 1)
val = 0;
data[ARM_SPE_CAP_MIN_IVAL] = val;
+
+ if (perf_pmu__scan_file(pmu, "caps/events", "%lu", &val) != 1)
+ val = 0;
+ data[ARM_SPE_CAP_EVENTS] = val;
}
free(cpuid);
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index fdef6f743cf3c76b1dcdd57f5a2c297642fdd21a..55b8391990467c8b7818bb63de3545d94d021bb7 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -1532,6 +1532,7 @@ static const char * const metadata_per_cpu_fmts[] = {
[ARM_SPE_CPU_MIDR] = " MIDR :0x%"PRIx64"\n",
[ARM_SPE_CPU_PMU_TYPE] = " PMU Type :%"PRId64"\n",
[ARM_SPE_CAP_MIN_IVAL] = " Min Interval :%"PRId64"\n",
+ [ARM_SPE_CAP_EVENTS] = " Events :0x%"PRIx64"\n",
};
static void arm_spe_print_info(struct arm_spe *spe, __u64 *arr)
diff --git a/tools/perf/util/arm-spe.h b/tools/perf/util/arm-spe.h
index 390679a4af2fb61419bc881b5dc43c01f1dd77d7..a47d3d8fc97e07d1bb41a6776133d0676c335613 100644
--- a/tools/perf/util/arm-spe.h
+++ b/tools/perf/util/arm-spe.h
@@ -47,6 +47,8 @@ enum {
ARM_SPE_CPU_PMU_TYPE,
/* Minimal interval */
ARM_SPE_CAP_MIN_IVAL,
+ /* Supported events */
+ ARM_SPE_CAP_EVENTS,
ARM_SPE_CPU_PRIV_MAX,
};
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 07/12] perf arm_spe: Refine memory level filling
2025-06-13 15:53 [PATCH 00/12] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
` (5 preceding siblings ...)
2025-06-13 15:53 ` [PATCH 06/12] perf arm_spe: Add "events" entry in meta data Leo Yan
@ 2025-06-13 15:53 ` Leo Yan
2025-06-20 10:27 ` James Clark
2025-06-13 15:53 ` [PATCH 08/12] perf arm_spe: Separate setting of memory levels for loads and stores Leo Yan
` (4 subsequent siblings)
11 siblings, 1 reply; 28+ messages in thread
From: Leo Yan @ 2025-06-13 15:53 UTC (permalink / raw)
To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter
Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
Leo Yan
This commit introduces macros for detecting cache level and cache miss.
Populates the 'mem_lvl_num' field which is a later added attribute for
representing memory level. Set NA ("not available") to memory levels if
memory hierarchy info is absent.
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
tools/perf/util/arm-spe.c | 32 +++++++++++++++++++++-----------
1 file changed, 21 insertions(+), 11 deletions(-)
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 55b8391990467c8b7818bb63de3545d94d021bb7..b2296cd025382ea36820641164ec71b13a4e7a0e 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -39,6 +39,15 @@
#define is_ldst_op(op) (!!((op) & ARM_SPE_OP_LDST))
+#define ARM_SPE_CACHE_EVENT(lvl) \
+ (ARM_SPE_##lvl##_ACCESS | ARM_SPE_##lvl##_MISS)
+
+#define arm_spe_is_cache_level(type, lvl) \
+ ((type) & ARM_SPE_CACHE_EVENT(lvl))
+
+#define arm_spe_is_cache_miss(type, lvl) \
+ ((type) & ARM_SPE_##lvl##_MISS)
+
struct arm_spe {
struct auxtrace auxtrace;
struct auxtrace_queues queues;
@@ -822,20 +831,21 @@ static const struct data_source_handle data_source_handles[] = {
static void arm_spe__synth_memory_level(const struct arm_spe_record *record,
union perf_mem_data_src *data_src)
{
- if (record->type & (ARM_SPE_LLC_ACCESS | ARM_SPE_LLC_MISS)) {
+ if (arm_spe_is_cache_level(record->type, LLC)) {
data_src->mem_lvl = PERF_MEM_LVL_L3;
-
- if (record->type & ARM_SPE_LLC_MISS)
- data_src->mem_lvl |= PERF_MEM_LVL_MISS;
- else
- data_src->mem_lvl |= PERF_MEM_LVL_HIT;
- } else if (record->type & (ARM_SPE_L1D_ACCESS | ARM_SPE_L1D_MISS)) {
+ data_src->mem_lvl |= arm_spe_is_cache_miss(record->type, LLC) ?
+ PERF_MEM_LVL_MISS : PERF_MEM_LVL_HIT;
+ data_src->mem_lvl_num = PERF_MEM_LVLNUM_L3;
+ } else if (arm_spe_is_cache_level(record->type, L1D)) {
data_src->mem_lvl = PERF_MEM_LVL_L1;
+ data_src->mem_lvl |= arm_spe_is_cache_miss(record->type, L1D) ?
+ PERF_MEM_LVL_MISS : PERF_MEM_LVL_HIT;
+ data_src->mem_lvl_num = PERF_MEM_LVLNUM_L1;
+ }
- if (record->type & ARM_SPE_L1D_MISS)
- data_src->mem_lvl |= PERF_MEM_LVL_MISS;
- else
- data_src->mem_lvl |= PERF_MEM_LVL_HIT;
+ if (!data_src->mem_lvl) {
+ data_src->mem_lvl = PERF_MEM_LVL_NA;
+ data_src->mem_lvl_num = PERF_MEM_LVLNUM_NA;
}
if (record->type & ARM_SPE_REMOTE_ACCESS)
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 08/12] perf arm_spe: Separate setting of memory levels for loads and stores
2025-06-13 15:53 [PATCH 00/12] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
` (6 preceding siblings ...)
2025-06-13 15:53 ` [PATCH 07/12] perf arm_spe: Refine memory level filling Leo Yan
@ 2025-06-13 15:53 ` Leo Yan
2025-06-20 10:30 ` James Clark
2025-06-13 15:53 ` [PATCH 09/12] perf arm_spe: Fill memory levels for FEAT_SPEv1p4 Leo Yan
` (3 subsequent siblings)
11 siblings, 1 reply; 28+ messages in thread
From: Leo Yan @ 2025-06-13 15:53 UTC (permalink / raw)
To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter
Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
Leo Yan
For a load hit, the lowest-level cache reflects the latency of fetching
a data. Otherwise, the highest-level cache involved in refilling
indicates the overhead caused by a load.
Store operations remain unchanged to keep the descending order when
iterating through cache levels.
Split into two functions: one is for setting memory levels for loads and
another for stores.
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
tools/perf/util/arm-spe.c | 45 +++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 43 insertions(+), 2 deletions(-)
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index b2296cd025382ea36820641164ec71b13a4e7a0e..8f18af7336db53b00b450eb4299feee350d0ecb9 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -45,6 +45,9 @@
#define arm_spe_is_cache_level(type, lvl) \
((type) & ARM_SPE_CACHE_EVENT(lvl))
+#define arm_spe_is_cache_hit(type, lvl) \
+ (((type) & ARM_SPE_CACHE_EVENT(lvl)) == ARM_SPE_##lvl##_ACCESS)
+
#define arm_spe_is_cache_miss(type, lvl) \
((type) & ARM_SPE_##lvl##_MISS)
@@ -828,9 +831,38 @@ static const struct data_source_handle data_source_handles[] = {
DS(hisi_hip_ds_encoding_cpus, data_source_hisi_hip),
};
-static void arm_spe__synth_memory_level(const struct arm_spe_record *record,
- union perf_mem_data_src *data_src)
+static void arm_spe__synth_ld_memory_level(const struct arm_spe_record *record,
+ union perf_mem_data_src *data_src)
+{
+ /*
+ * To find a cache hit, search in ascending order from the lower level
+ * caches to the higher level caches. This reflects the best scenario
+ * for a cache hit.
+ */
+ if (arm_spe_is_cache_hit(record->type, L1D)) {
+ data_src->mem_lvl = PERF_MEM_LVL_L1 | PERF_MEM_LVL_HIT;
+ data_src->mem_lvl_num = PERF_MEM_LVLNUM_L1;
+ } else if (arm_spe_is_cache_hit(record->type, LLC)) {
+ data_src->mem_lvl = PERF_MEM_LVL_L3 | PERF_MEM_LVL_HIT;
+ data_src->mem_lvl_num = PERF_MEM_LVLNUM_L3;
+ /*
+ * To find a cache miss, search in descending order from the higher
+ * level cache to the lower level cache. This represents the worst
+ * scenario for a cache miss.
+ */
+ } else if (arm_spe_is_cache_miss(record->type, LLC)) {
+ data_src->mem_lvl = PERF_MEM_LVL_L3 | PERF_MEM_LVL_MISS;
+ data_src->mem_lvl_num = PERF_MEM_LVLNUM_L3;
+ } else if (arm_spe_is_cache_miss(record->type, L1D)) {
+ data_src->mem_lvl = PERF_MEM_LVL_L1 | PERF_MEM_LVL_MISS;
+ data_src->mem_lvl_num = PERF_MEM_LVLNUM_L1;
+ }
+}
+
+static void arm_spe__synth_st_memory_level(const struct arm_spe_record *record,
+ union perf_mem_data_src *data_src)
{
+ /* Record the greatest level info for a store operation. */
if (arm_spe_is_cache_level(record->type, LLC)) {
data_src->mem_lvl = PERF_MEM_LVL_L3;
data_src->mem_lvl |= arm_spe_is_cache_miss(record->type, LLC) ?
@@ -842,6 +874,15 @@ static void arm_spe__synth_memory_level(const struct arm_spe_record *record,
PERF_MEM_LVL_MISS : PERF_MEM_LVL_HIT;
data_src->mem_lvl_num = PERF_MEM_LVLNUM_L1;
}
+}
+
+static void arm_spe__synth_memory_level(const struct arm_spe_record *record,
+ union perf_mem_data_src *data_src)
+{
+ if (data_src->mem_op == PERF_MEM_OP_LOAD)
+ arm_spe__synth_ld_memory_level(record, data_src);
+ if (data_src->mem_op == PERF_MEM_OP_STORE)
+ arm_spe__synth_st_memory_level(record, data_src);
if (!data_src->mem_lvl) {
data_src->mem_lvl = PERF_MEM_LVL_NA;
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 09/12] perf arm_spe: Fill memory levels for FEAT_SPEv1p4
2025-06-13 15:53 [PATCH 00/12] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
` (7 preceding siblings ...)
2025-06-13 15:53 ` [PATCH 08/12] perf arm_spe: Separate setting of memory levels for loads and stores Leo Yan
@ 2025-06-13 15:53 ` Leo Yan
2025-06-20 10:37 ` James Clark
2025-06-13 15:53 ` [PATCH 10/12] perf arm_spe: Refactor arm_spe__get_metadata_by_cpu() Leo Yan
` (2 subsequent siblings)
11 siblings, 1 reply; 28+ messages in thread
From: Leo Yan @ 2025-06-13 15:53 UTC (permalink / raw)
To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter
Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
Leo Yan
Starting with FEAT_SPEv1p4, Arm SPE provides information on Level 2 data
cache and recently fetched events. This patch fills in the memory levels
for these new events.
The recently fetched events are matched to line-fill buffer (LFB). In
general, the latency for accessing LFB is higher than accessing L1 cache
but lower than accessing L2 cache. Thus, it locates in the memory
hierarchy information between L1 cache and L2 cache.
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
tools/perf/util/arm-spe-decoder/arm-spe-decoder.h | 3 +++
tools/perf/util/arm-spe.c | 14 ++++++++++++++
2 files changed, 17 insertions(+)
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
index 03da55453da8fd2e7b9e2dcba3ddcf5243599e1c..90c76928c7bf1b35cec538abdb0e88d6083fe81b 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
@@ -25,6 +25,9 @@
#define ARM_SPE_SVE_PARTIAL_PRED BIT(EV_PARTIAL_PREDICATE)
#define ARM_SPE_SVE_EMPTY_PRED BIT(EV_EMPTY_PREDICATE)
#define ARM_SPE_IN_TXN BIT(EV_TRANSACTIONAL)
+#define ARM_SPE_L2D_ACCESS BIT(EV_L2D_ACCESS)
+#define ARM_SPE_L2D_MISS BIT(EV_L2D_MISS)
+#define ARM_SPE_RECENTLY_FETCH BIT(EV_RECENTLY_FETCHED)
enum arm_spe_op_type {
/* First level operation type */
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 8f18af7336db53b00b450eb4299feee350d0ecb9..2ab38d21d52f73617451a6a79f9d5ae931a34f49 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -842,6 +842,12 @@ static void arm_spe__synth_ld_memory_level(const struct arm_spe_record *record,
if (arm_spe_is_cache_hit(record->type, L1D)) {
data_src->mem_lvl = PERF_MEM_LVL_L1 | PERF_MEM_LVL_HIT;
data_src->mem_lvl_num = PERF_MEM_LVLNUM_L1;
+ } else if (record->type & ARM_SPE_RECENTLY_FETCH) {
+ data_src->mem_lvl = PERF_MEM_LVL_LFB | PERF_MEM_LVL_HIT;
+ data_src->mem_lvl_num = PERF_MEM_LVLNUM_LFB;
+ } else if (arm_spe_is_cache_hit(record->type, L2D)) {
+ data_src->mem_lvl = PERF_MEM_LVL_L2 | PERF_MEM_LVL_HIT;
+ data_src->mem_lvl_num = PERF_MEM_LVLNUM_L2;
} else if (arm_spe_is_cache_hit(record->type, LLC)) {
data_src->mem_lvl = PERF_MEM_LVL_L3 | PERF_MEM_LVL_HIT;
data_src->mem_lvl_num = PERF_MEM_LVLNUM_L3;
@@ -853,6 +859,9 @@ static void arm_spe__synth_ld_memory_level(const struct arm_spe_record *record,
} else if (arm_spe_is_cache_miss(record->type, LLC)) {
data_src->mem_lvl = PERF_MEM_LVL_L3 | PERF_MEM_LVL_MISS;
data_src->mem_lvl_num = PERF_MEM_LVLNUM_L3;
+ } else if (arm_spe_is_cache_miss(record->type, L2D)) {
+ data_src->mem_lvl = PERF_MEM_LVL_L2 | PERF_MEM_LVL_MISS;
+ data_src->mem_lvl_num = PERF_MEM_LVLNUM_L2;
} else if (arm_spe_is_cache_miss(record->type, L1D)) {
data_src->mem_lvl = PERF_MEM_LVL_L1 | PERF_MEM_LVL_MISS;
data_src->mem_lvl_num = PERF_MEM_LVLNUM_L1;
@@ -868,6 +877,11 @@ static void arm_spe__synth_st_memory_level(const struct arm_spe_record *record,
data_src->mem_lvl |= arm_spe_is_cache_miss(record->type, LLC) ?
PERF_MEM_LVL_MISS : PERF_MEM_LVL_HIT;
data_src->mem_lvl_num = PERF_MEM_LVLNUM_L3;
+ } else if (arm_spe_is_cache_level(record->type, L2D)) {
+ data_src->mem_lvl = PERF_MEM_LVL_L2;
+ data_src->mem_lvl |= arm_spe_is_cache_miss(record->type, L2D) ?
+ PERF_MEM_LVL_MISS : PERF_MEM_LVL_HIT;
+ data_src->mem_lvl_num = PERF_MEM_LVLNUM_L2;
} else if (arm_spe_is_cache_level(record->type, L1D)) {
data_src->mem_lvl = PERF_MEM_LVL_L1;
data_src->mem_lvl |= arm_spe_is_cache_miss(record->type, L1D) ?
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 10/12] perf arm_spe: Refactor arm_spe__get_metadata_by_cpu()
2025-06-13 15:53 [PATCH 00/12] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
` (8 preceding siblings ...)
2025-06-13 15:53 ` [PATCH 09/12] perf arm_spe: Fill memory levels for FEAT_SPEv1p4 Leo Yan
@ 2025-06-13 15:53 ` Leo Yan
2025-06-20 10:45 ` James Clark
2025-06-13 15:53 ` [PATCH 11/12] perf arm_spe: Set HITM flag Leo Yan
2025-06-13 15:53 ` [PATCH 12/12] perf arm_spe: Allow parsing both data source and events Leo Yan
11 siblings, 1 reply; 28+ messages in thread
From: Leo Yan @ 2025-06-13 15:53 UTC (permalink / raw)
To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter
Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
Leo Yan
Handle "CPU=-1" (per-thread mode) in the arm_spe__get_metadata_by_cpu()
function. As a result, the function is more general and will be invoked
by a sequential change.
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
tools/perf/util/arm-spe.c | 30 ++++++++++++++----------------
1 file changed, 14 insertions(+), 16 deletions(-)
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 2ab38d21d52f73617451a6a79f9d5ae931a34f49..8e93b0d151a98714d0c5e5f6ceec386a2aa63ad0 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -324,6 +324,19 @@ static u64 *arm_spe__get_metadata_by_cpu(struct arm_spe *spe, u64 cpu)
if (!spe->metadata)
return NULL;
+ /* CPU ID is -1 for per-thread mode */
+ if (cpu < 0) {
+ /*
+ * On the heterogeneous system, due to CPU ID is -1,
+ * cannot confirm the meta data.
+ */
+ if (!spe->is_homogeneous)
+ return NULL;
+
+ /* In homogeneous system, simply use CPU0's metadata */
+ return spe->metadata[0];
+ }
+
for (i = 0; i < spe->metadata_nr_cpu; i++)
if (spe->metadata[i][ARM_SPE_CPU] == cpu)
return spe->metadata[i];
@@ -924,22 +937,7 @@ static bool arm_spe__synth_ds(struct arm_spe_queue *speq,
cpuid = perf_env__cpuid(spe->session->evlist->env);
midr = strtol(cpuid, NULL, 16);
} else {
- /* CPU ID is -1 for per-thread mode */
- if (speq->cpu < 0) {
- /*
- * On the heterogeneous system, due to CPU ID is -1,
- * cannot confirm the data source packet is supported.
- */
- if (!spe->is_homogeneous)
- return false;
-
- /* In homogeneous system, simply use CPU0's metadata */
- if (spe->metadata)
- metadata = spe->metadata[0];
- } else {
- metadata = arm_spe__get_metadata_by_cpu(spe, speq->cpu);
- }
-
+ metadata = arm_spe__get_metadata_by_cpu(spe, speq->cpu);
if (!metadata)
return false;
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 11/12] perf arm_spe: Set HITM flag
2025-06-13 15:53 [PATCH 00/12] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
` (9 preceding siblings ...)
2025-06-13 15:53 ` [PATCH 10/12] perf arm_spe: Refactor arm_spe__get_metadata_by_cpu() Leo Yan
@ 2025-06-13 15:53 ` Leo Yan
2025-06-20 10:51 ` James Clark
2025-06-13 15:53 ` [PATCH 12/12] perf arm_spe: Allow parsing both data source and events Leo Yan
11 siblings, 1 reply; 28+ messages in thread
From: Leo Yan @ 2025-06-13 15:53 UTC (permalink / raw)
To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter
Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
Leo Yan
Since FEAT_SPEv1p4, Arm SPE provides extra two events "Cache data
modified" and "Data snooped".
Set the snoop mode as:
- If both the "Cache data modified" event and the "Data snooped" event
are set, which indicates a load operation that snooped from a outside
cache and hit a modified copy, set the HITM flag to inspect false
sharing.
- If the snooped event bit is not set, and the snooped event has been
supported by the hardware, set as NONE mode (no snoop operation).
- If the snooped event bit is not set, and the event is not supported or
absent the events info in the meta data, set as NA mode (not
available).
Don't set any mode for only "Cache data modified" event, as it hits a
local modified copy.
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
tools/perf/util/arm-spe-decoder/arm-spe-decoder.h | 2 ++
tools/perf/util/arm-spe.c | 26 +++++++++++++++++++++--
2 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
index 90c76928c7bf1b35cec538abdb0e88d6083fe81b..a2b48b0c87712f232587023eeaa66a9b83aed382 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
@@ -28,6 +28,8 @@
#define ARM_SPE_L2D_ACCESS BIT(EV_L2D_ACCESS)
#define ARM_SPE_L2D_MISS BIT(EV_L2D_MISS)
#define ARM_SPE_RECENTLY_FETCH BIT(EV_RECENTLY_FETCHED)
+#define ARM_SPE_DATA_SNOOPED BIT(EV_DATA_SNOOPED)
+#define ARM_SPE_HITM BIT(EV_CACHE_DATA_MODIFIED)
enum arm_spe_op_type {
/* First level operation type */
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 8e93b0d151a98714d0c5e5f6ceec386a2aa63ad0..8a889f727f9cd5351b4ca027935112eddd16ea6c 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -903,9 +903,12 @@ static void arm_spe__synth_st_memory_level(const struct arm_spe_record *record,
}
}
-static void arm_spe__synth_memory_level(const struct arm_spe_record *record,
+static void arm_spe__synth_memory_level(struct arm_spe_queue *speq,
+ const struct arm_spe_record *record,
union perf_mem_data_src *data_src)
{
+ struct arm_spe *spe = speq->spe;
+
if (data_src->mem_op == PERF_MEM_OP_LOAD)
arm_spe__synth_ld_memory_level(record, data_src);
if (data_src->mem_op == PERF_MEM_OP_STORE)
@@ -916,6 +919,25 @@ static void arm_spe__synth_memory_level(const struct arm_spe_record *record,
data_src->mem_lvl_num = PERF_MEM_LVLNUM_NA;
}
+ if (record->type & ARM_SPE_DATA_SNOOPED) {
+ if (record->type & ARM_SPE_HITM)
+ data_src->mem_snoop = PERF_MEM_SNOOP_HITM;
+ else
+ data_src->mem_snoop = PERF_MEM_SNOOP_HIT;
+ } else {
+ u64 *metadata = arm_spe__get_metadata_by_cpu(spe, speq->cpu);
+
+ /*
+ * Set NA ("Not available") mode if no meta data or the
+ * SNOOPED event is not supported.
+ */
+ if (!metadata ||
+ !(metadata[ARM_SPE_CAP_EVENTS] & ARM_SPE_DATA_SNOOPED))
+ data_src->mem_snoop = PERF_MEM_SNOOP_NA;
+ else
+ data_src->mem_snoop = PERF_MEM_SNOOP_NONE;
+ }
+
if (record->type & ARM_SPE_REMOTE_ACCESS)
data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
}
@@ -971,7 +993,7 @@ static u64 arm_spe__synth_data_source(struct arm_spe_queue *speq,
return 0;
if (!arm_spe__synth_ds(speq, record, &data_src))
- arm_spe__synth_memory_level(record, &data_src);
+ arm_spe__synth_memory_level(speq, record, &data_src);
if (record->type & (ARM_SPE_TLB_ACCESS | ARM_SPE_TLB_MISS)) {
data_src.mem_dtlb = PERF_MEM_TLB_WK;
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 12/12] perf arm_spe: Allow parsing both data source and events
2025-06-13 15:53 [PATCH 00/12] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
` (10 preceding siblings ...)
2025-06-13 15:53 ` [PATCH 11/12] perf arm_spe: Set HITM flag Leo Yan
@ 2025-06-13 15:53 ` Leo Yan
2025-06-20 10:55 ` James Clark
11 siblings, 1 reply; 28+ messages in thread
From: Leo Yan @ 2025-06-13 15:53 UTC (permalink / raw)
To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter
Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
Leo Yan
Current code skips to parse events after generating data source. The
reason is the data source packets have cache and snooping related info,
the afterwards event packets might contain duplicate info.
This commit changes to continue parsing the events after data source
analysis. If data source does not give out memory level and snooping
types, then the event info is used to synthesize the related fields.
As a result, both the peer snoop option ('-d peer') and hitm options
('-d tot/lcl/rmt') are supported by Arm SPE in the perf c2c.
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
tools/perf/util/arm-spe.c | 69 ++++++++++++++++++++++++++++-------------------
1 file changed, 41 insertions(+), 28 deletions(-)
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 8a889f727f9cd5351b4ca027935112eddd16ea6c..8fde6f6cbce92aabf20d25b01ee2ade4aae7ea61 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -909,40 +909,54 @@ static void arm_spe__synth_memory_level(struct arm_spe_queue *speq,
{
struct arm_spe *spe = speq->spe;
- if (data_src->mem_op == PERF_MEM_OP_LOAD)
- arm_spe__synth_ld_memory_level(record, data_src);
- if (data_src->mem_op == PERF_MEM_OP_STORE)
- arm_spe__synth_st_memory_level(record, data_src);
+ /*
+ * The data source packet contains more info for cache levels for
+ * peer snooping. So respect the memory level if has been set by
+ * data source parsing.
+ */
+ if (!data_src->mem_lvl) {
+ if (data_src->mem_op == PERF_MEM_OP_LOAD)
+ arm_spe__synth_ld_memory_level(record, data_src);
+ if (data_src->mem_op == PERF_MEM_OP_STORE)
+ arm_spe__synth_st_memory_level(record, data_src);
+ }
if (!data_src->mem_lvl) {
data_src->mem_lvl = PERF_MEM_LVL_NA;
data_src->mem_lvl_num = PERF_MEM_LVLNUM_NA;
}
- if (record->type & ARM_SPE_DATA_SNOOPED) {
- if (record->type & ARM_SPE_HITM)
- data_src->mem_snoop = PERF_MEM_SNOOP_HITM;
- else
- data_src->mem_snoop = PERF_MEM_SNOOP_HIT;
- } else {
- u64 *metadata = arm_spe__get_metadata_by_cpu(spe, speq->cpu);
-
- /*
- * Set NA ("Not available") mode if no meta data or the
- * SNOOPED event is not supported.
- */
- if (!metadata ||
- !(metadata[ARM_SPE_CAP_EVENTS] & ARM_SPE_DATA_SNOOPED))
- data_src->mem_snoop = PERF_MEM_SNOOP_NA;
- else
- data_src->mem_snoop = PERF_MEM_SNOOP_NONE;
+ /*
+ * If 'mem_snoop' has been set by data source packet, skip to set
+ * it at here.
+ */
+ if (!data_src->mem_snoop) {
+ if (record->type & ARM_SPE_DATA_SNOOPED) {
+ if (record->type & ARM_SPE_HITM)
+ data_src->mem_snoop = PERF_MEM_SNOOP_HITM;
+ else
+ data_src->mem_snoop = PERF_MEM_SNOOP_HIT;
+ } else {
+ u64 *metadata =
+ arm_spe__get_metadata_by_cpu(spe, speq->cpu);
+
+ /*
+ * Set NA ("Not available") mode if no meta data or the
+ * SNOOPED event is not supported.
+ */
+ if (!metadata ||
+ !(metadata[ARM_SPE_CAP_EVENTS] & ARM_SPE_DATA_SNOOPED))
+ data_src->mem_snoop = PERF_MEM_SNOOP_NA;
+ else
+ data_src->mem_snoop = PERF_MEM_SNOOP_NONE;
+ }
}
if (record->type & ARM_SPE_REMOTE_ACCESS)
data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
}
-static bool arm_spe__synth_ds(struct arm_spe_queue *speq,
+static void arm_spe__synth_ds(struct arm_spe_queue *speq,
const struct arm_spe_record *record,
union perf_mem_data_src *data_src)
{
@@ -961,19 +975,18 @@ static bool arm_spe__synth_ds(struct arm_spe_queue *speq,
} else {
metadata = arm_spe__get_metadata_by_cpu(spe, speq->cpu);
if (!metadata)
- return false;
+ return;
midr = metadata[ARM_SPE_CPU_MIDR];
}
for (i = 0; i < ARRAY_SIZE(data_source_handles); i++) {
if (is_midr_in_range_list(midr, data_source_handles[i].midr_ranges)) {
- data_source_handles[i].ds_synth(record, data_src);
- return true;
+ return data_source_handles[i].ds_synth(record, data_src);
}
}
- return false;
+ return;
}
static u64 arm_spe__synth_data_source(struct arm_spe_queue *speq,
@@ -992,8 +1005,8 @@ static u64 arm_spe__synth_data_source(struct arm_spe_queue *speq,
else
return 0;
- if (!arm_spe__synth_ds(speq, record, &data_src))
- arm_spe__synth_memory_level(speq, record, &data_src);
+ arm_spe__synth_ds(speq, record, &data_src);
+ arm_spe__synth_memory_level(speq, record, &data_src);
if (record->type & (ARM_SPE_TLB_ACCESS | ARM_SPE_TLB_MISS)) {
data_src.mem_dtlb = PERF_MEM_TLB_WK;
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH 01/12] drivers/perf: arm_spe: Store event reserved bits in driver data
2025-06-13 15:53 ` [PATCH 01/12] drivers/perf: arm_spe: Store event reserved bits in driver data Leo Yan
@ 2025-06-19 11:28 ` James Clark
2025-06-19 16:22 ` Leo Yan
0 siblings, 1 reply; 28+ messages in thread
From: James Clark @ 2025-06-19 11:28 UTC (permalink / raw)
To: Leo Yan
Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
Will Deacon, Mark Rutland, Arnaldo Carvalho de Melo, Namhyung Kim,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter
On 13/06/2025 4:53 pm, Leo Yan wrote:
> Store the reserved event bits in the driver structure. This cached value
> avoids redundant calls to arm_spe_pmsevfr_res0() each time a profiling
> session is created.
> > Signed-off-by: Leo Yan <leo.yan@arm.com>
> ---
> drivers/perf/arm_spe_pmu.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
> index 3efed8839a4ec5604eba242cb620327cd2a6a87d..be2ed326bb794d7e5dd1d6cfa89330753ced3ca5 100644
> --- a/drivers/perf/arm_spe_pmu.c
> +++ b/drivers/perf/arm_spe_pmu.c
> @@ -77,6 +77,7 @@ struct arm_spe_pmu {
> u16 pmsver;
> u16 min_period;
> u16 counter_sz;
> + u64 pmsevfr_res0;
IMO this is a premature optimization and we shouldn't store things that
are only transformations of some other already stored value. It becomes
another thing to worry about when it's valid to read and to potentially
keep it up to date etc.
It's just one new one now, but eventually you end up with tons of them
and then someone forgets to update a dependent one when the parent value
changes and it makes a mess.
And arm_spe_pmu_event_init() isn't even a particularly hot path.
>
> #define SPE_PMU_FEAT_FILT_EVT (1UL << 0)
> #define SPE_PMU_FEAT_FILT_TYP (1UL << 1)
> @@ -722,10 +723,10 @@ static int arm_spe_pmu_event_init(struct perf_event *event)
> !cpumask_test_cpu(event->cpu, &spe_pmu->supported_cpus))
> return -ENOENT;
>
> - if (arm_spe_event_to_pmsevfr(event) & arm_spe_pmsevfr_res0(spe_pmu->pmsver))
> + if (arm_spe_event_to_pmsevfr(event) & spe_pmu->pmsevfr_res0)
> return -EOPNOTSUPP;
>
> - if (arm_spe_event_to_pmsnevfr(event) & arm_spe_pmsevfr_res0(spe_pmu->pmsver))
> + if (arm_spe_event_to_pmsnevfr(event) & spe_pmu->pmsevfr_res0)
> return -EOPNOTSUPP;
>
> if (attr->exclude_idle)
> @@ -1103,6 +1104,8 @@ static void __arm_spe_pmu_dev_probe(void *info)
> spe_pmu->counter_sz = 16;
> }
>
> + spe_pmu->pmsevfr_res0 = arm_spe_pmsevfr_res0(spe_pmu->pmsver);
> +
> dev_info(dev,
> "probed SPEv1.%d for CPUs %*pbl [max_record_sz %u, align %u, features 0x%llx]\n",
> spe_pmu->pmsver - 1, cpumask_pr_args(&spe_pmu->supported_cpus),
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 02/12] drivers/perf: arm_spe: Expose events capability
2025-06-13 15:53 ` [PATCH 02/12] drivers/perf: arm_spe: Expose events capability Leo Yan
@ 2025-06-19 11:32 ` James Clark
2025-06-19 16:24 ` Leo Yan
0 siblings, 1 reply; 28+ messages in thread
From: James Clark @ 2025-06-19 11:32 UTC (permalink / raw)
To: Leo Yan
Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
Will Deacon, Mark Rutland, Arnaldo Carvalho de Melo, Namhyung Kim,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter
On 13/06/2025 4:53 pm, Leo Yan wrote:
> Expose an events entry in the caps folder to inform user space which
> events are valid.
>
> Signed-off-by: Leo Yan <leo.yan@arm.com>
> ---
> drivers/perf/arm_spe_pmu.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
> index be2ed326bb794d7e5dd1d6cfa89330753ced3ca5..b59c394b715bc49af0ae30c521f97813a046755d 100644
> --- a/drivers/perf/arm_spe_pmu.c
> +++ b/drivers/perf/arm_spe_pmu.c
> @@ -116,6 +116,7 @@ enum arm_spe_pmu_capabilities {
> SPE_PMU_CAP_FEAT_MAX,
> SPE_PMU_CAP_CNT_SZ = SPE_PMU_CAP_FEAT_MAX,
> SPE_PMU_CAP_MIN_IVAL,
> + SPE_PMU_CAP_EVENTS,
> };
>
> static int arm_spe_pmu_feat_caps[SPE_PMU_CAP_FEAT_MAX] = {
> @@ -123,7 +124,7 @@ static int arm_spe_pmu_feat_caps[SPE_PMU_CAP_FEAT_MAX] = {
> [SPE_PMU_CAP_ERND] = SPE_PMU_FEAT_ERND,
> };
>
> -static u32 arm_spe_pmu_cap_get(struct arm_spe_pmu *spe_pmu, int cap)
> +static u64 arm_spe_pmu_cap_get(struct arm_spe_pmu *spe_pmu, int cap)
> {
> if (cap < SPE_PMU_CAP_FEAT_MAX)
> return !!(spe_pmu->features & arm_spe_pmu_feat_caps[cap]);
> @@ -133,6 +134,8 @@ static u32 arm_spe_pmu_cap_get(struct arm_spe_pmu *spe_pmu, int cap)
> return spe_pmu->counter_sz;
> case SPE_PMU_CAP_MIN_IVAL:
> return spe_pmu->min_period;
> + case SPE_PMU_CAP_EVENTS:
> + return ~spe_pmu->pmsevfr_res0;
> default:
> WARN(1, "unknown cap %d\n", cap);
> }
> @@ -149,7 +152,7 @@ static ssize_t arm_spe_pmu_cap_show(struct device *dev,
> container_of(attr, struct dev_ext_attribute, attr);
> int cap = (long)ea->var;
>
> - return sysfs_emit(buf, "%u\n", arm_spe_pmu_cap_get(spe_pmu, cap));
> + return sysfs_emit(buf, "%llu\n", arm_spe_pmu_cap_get(spe_pmu, cap));
> }
>
> #define SPE_EXT_ATTR_ENTRY(_name, _func, _var) \
> @@ -165,6 +168,7 @@ static struct attribute *arm_spe_pmu_cap_attr[] = {
> SPE_CAP_EXT_ATTR_ENTRY(ernd, SPE_PMU_CAP_ERND),
> SPE_CAP_EXT_ATTR_ENTRY(count_size, SPE_PMU_CAP_CNT_SZ),
> SPE_CAP_EXT_ATTR_ENTRY(min_interval, SPE_PMU_CAP_MIN_IVAL),
> + SPE_CAP_EXT_ATTR_ENTRY(events, SPE_PMU_CAP_EVENTS),
Would "event_filters" be a better name? Technically an SPE version can
support a filter but the event itself might not be implemented.
> NULL,
> };
>
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 03/12] perf arm_spe: Correct setting remote access
2025-06-13 15:53 ` [PATCH 03/12] perf arm_spe: Correct setting remote access Leo Yan
@ 2025-06-19 13:53 ` James Clark
2025-06-19 16:45 ` Leo Yan
0 siblings, 1 reply; 28+ messages in thread
From: James Clark @ 2025-06-19 13:53 UTC (permalink / raw)
To: Leo Yan
Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
Will Deacon, Mark Rutland, Arnaldo Carvalho de Melo, Namhyung Kim,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter
On 13/06/2025 4:53 pm, Leo Yan wrote:
> Set the mem_remote field for a remote access to appropriately represent
> the event.
>
> Fixes: a89dbc9b988f ("perf arm-spe: Set sample's data source field")
> Signed-off-by: Leo Yan <leo.yan@arm.com>
> ---
> tools/perf/util/arm-spe.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
> index d46e0cccac99a36148b4daa37f2bf2342e6b47ef..fdef6f743cf3c76b1dcdd57f5a2c297642fdd21a 100644
> --- a/tools/perf/util/arm-spe.c
> +++ b/tools/perf/util/arm-spe.c
> @@ -839,7 +839,7 @@ static void arm_spe__synth_memory_level(const struct arm_spe_record *record,
> }
>
> if (record->type & ARM_SPE_REMOTE_ACCESS)
> - data_src->mem_lvl |= PERF_MEM_LVL_REM_CCE1;
> + data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
Should this not avoid overwriting mem_remote if it's already set by
arm_spe__synth_ds()? We do that for mem_lvl above.
We also still set mem_lvl = PERF_MEM_LVL_REM_CCE1 in
arm_spe__synth_data_source_common(), if it's not right it should
probably be changed there too.
> }
>
> static bool arm_spe__synth_ds(struct arm_spe_queue *speq,
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 04/12] perf arm_spe: Directly propagate raw event
2025-06-13 15:53 ` [PATCH 04/12] perf arm_spe: Directly propagate raw event Leo Yan
@ 2025-06-19 14:13 ` James Clark
0 siblings, 0 replies; 28+ messages in thread
From: James Clark @ 2025-06-19 14:13 UTC (permalink / raw)
To: Leo Yan
Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
Will Deacon, Mark Rutland, Arnaldo Carvalho de Melo, Namhyung Kim,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter
On 13/06/2025 4:53 pm, Leo Yan wrote:
> Two separate sets of event bits are defined: one for used in the code
> for generating samples and another for the backend decoder. Reduce
> the redundancy by using the raw event bits directly in the frontend
> code.
>
> To avoid overflow issues, change the type of the event variable from
> enum to u64.
>
> Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
> ---
> tools/perf/util/arm-spe-decoder/arm-spe-decoder.c | 37 +----------------------
> tools/perf/util/arm-spe-decoder/arm-spe-decoder.h | 28 ++++++++---------
> 2 files changed, 14 insertions(+), 51 deletions(-)
>
> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
> index 688fe6d7524416420a7c18d5f8a268492ce7c3b8..96eb7cced6fd1574f5d823e4c67b9051dcf183ed 100644
> --- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
> @@ -229,42 +229,7 @@ static int arm_spe_read_record(struct arm_spe_decoder *decoder)
> }
> break;
> case ARM_SPE_EVENTS:
> - if (payload & BIT(EV_L1D_REFILL))
> - decoder->record.type |= ARM_SPE_L1D_MISS;
> -
> - if (payload & BIT(EV_L1D_ACCESS))
> - decoder->record.type |= ARM_SPE_L1D_ACCESS;
> -
> - if (payload & BIT(EV_TLB_WALK))
> - decoder->record.type |= ARM_SPE_TLB_MISS;
> -
> - if (payload & BIT(EV_TLB_ACCESS))
> - decoder->record.type |= ARM_SPE_TLB_ACCESS;
> -
> - if (payload & BIT(EV_LLC_MISS))
> - decoder->record.type |= ARM_SPE_LLC_MISS;
> -
> - if (payload & BIT(EV_LLC_ACCESS))
> - decoder->record.type |= ARM_SPE_LLC_ACCESS;
> -
> - if (payload & BIT(EV_REMOTE_ACCESS))
> - decoder->record.type |= ARM_SPE_REMOTE_ACCESS;
> -
> - if (payload & BIT(EV_MISPRED))
> - decoder->record.type |= ARM_SPE_BRANCH_MISS;
> -
> - if (payload & BIT(EV_NOT_TAKEN))
> - decoder->record.type |= ARM_SPE_BRANCH_NOT_TAKEN;
> -
> - if (payload & BIT(EV_TRANSACTIONAL))
> - decoder->record.type |= ARM_SPE_IN_TXN;
> -
> - if (payload & BIT(EV_PARTIAL_PREDICATE))
> - decoder->record.type |= ARM_SPE_SVE_PARTIAL_PRED;
> -
> - if (payload & BIT(EV_EMPTY_PREDICATE))
> - decoder->record.type |= ARM_SPE_SVE_EMPTY_PRED;
> -
> + decoder->record.type = payload;
> break;
> case ARM_SPE_DATA_SOURCE:
> decoder->record.source = payload;
> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
> index 881d9f29c1380b62486f0cd81498750ba06c4b50..03da55453da8fd2e7b9e2dcba3ddcf5243599e1c 100644
> --- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
> @@ -13,20 +13,18 @@
>
> #include "arm-spe-pkt-decoder.h"
>
> -enum arm_spe_sample_type {
> - ARM_SPE_L1D_ACCESS = 1 << 0,
> - ARM_SPE_L1D_MISS = 1 << 1,
> - ARM_SPE_LLC_ACCESS = 1 << 2,
> - ARM_SPE_LLC_MISS = 1 << 3,
> - ARM_SPE_TLB_ACCESS = 1 << 4,
> - ARM_SPE_TLB_MISS = 1 << 5,
> - ARM_SPE_BRANCH_MISS = 1 << 6,
> - ARM_SPE_REMOTE_ACCESS = 1 << 7,
> - ARM_SPE_SVE_PARTIAL_PRED = 1 << 8,
> - ARM_SPE_SVE_EMPTY_PRED = 1 << 9,
> - ARM_SPE_BRANCH_NOT_TAKEN = 1 << 10,
> - ARM_SPE_IN_TXN = 1 << 11,
> -};
> +#define ARM_SPE_L1D_ACCESS BIT(EV_L1D_ACCESS)
> +#define ARM_SPE_L1D_MISS BIT(EV_L1D_REFILL)
> +#define ARM_SPE_LLC_ACCESS BIT(EV_LLC_ACCESS)
> +#define ARM_SPE_LLC_MISS BIT(EV_LLC_MISS)
> +#define ARM_SPE_TLB_ACCESS BIT(EV_TLB_ACCESS)
> +#define ARM_SPE_TLB_MISS BIT(EV_TLB_WALK)
> +#define ARM_SPE_BRANCH_MISS BIT(EV_MISPRED)
> +#define ARM_SPE_BRANCH_NOT_TAKEN BIT(EV_NOT_TAKEN)
> +#define ARM_SPE_REMOTE_ACCESS BIT(EV_REMOTE_ACCESS)
> +#define ARM_SPE_SVE_PARTIAL_PRED BIT(EV_PARTIAL_PREDICATE)
> +#define ARM_SPE_SVE_EMPTY_PRED BIT(EV_EMPTY_PREDICATE)
> +#define ARM_SPE_IN_TXN BIT(EV_TRANSACTIONAL)
>
> enum arm_spe_op_type {
> /* First level operation type */
> @@ -100,7 +98,7 @@ enum arm_spe_hisi_hip_data_source {
> };
>
> struct arm_spe_record {
> - enum arm_spe_sample_type type;
> + u64 type;
> int err;
> u32 op;
> u32 latency;
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 05/12] perf arm_spe: Decode event types for new features
2025-06-13 15:53 ` [PATCH 05/12] perf arm_spe: Decode event types for new features Leo Yan
@ 2025-06-19 14:20 ` James Clark
0 siblings, 0 replies; 28+ messages in thread
From: James Clark @ 2025-06-19 14:20 UTC (permalink / raw)
To: Leo Yan
Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
Will Deacon, Mark Rutland, Arnaldo Carvalho de Melo, Namhyung Kim,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter
On 13/06/2025 4:53 pm, Leo Yan wrote:
> Decode new event types introduced by FEAT_SPEv1p4, FEAT_SPE_SME and
> FEAT_SPE_SME.
>
> The printed event names don't strictly follow the naming in the Arm ARM.
> For example, the "Cache data modified" event is shown as "HITM", and the
> "Data snooped" event is printed as "SNOOPED". Shorter names are easier
> to read and review while preserving core meanings.
>
Reviewed-by: James Clark <james.clark@linaro.org>
> Signed-off-by: Leo Yan <leo.yan@arm.com>
> ---
> tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c | 14 ++++++++++++++
> tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h | 7 +++++++
> 2 files changed, 21 insertions(+)
>
> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> index 13cadb2f1ceac7a90e359c4d6aa1d5fc5169e142..80561630253dd5c46f7e99b24fc13b99f346459f 100644
> --- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> @@ -314,6 +314,20 @@ static int arm_spe_pkt_desc_event(const struct arm_spe_pkt *packet,
> arm_spe_pkt_out_string(&err, &buf, &buf_len, " SVE-PARTIAL-PRED");
> if (payload & BIT(EV_EMPTY_PREDICATE))
> arm_spe_pkt_out_string(&err, &buf, &buf_len, " SVE-EMPTY-PRED");
> + if (payload & BIT(EV_L2D_ACCESS))
> + arm_spe_pkt_out_string(&err, &buf, &buf_len, " L2D-ACCESS");
> + if (payload & BIT(EV_L2D_MISS))
> + arm_spe_pkt_out_string(&err, &buf, &buf_len, " L2D-MISS");
> + if (payload & BIT(EV_CACHE_DATA_MODIFIED))
> + arm_spe_pkt_out_string(&err, &buf, &buf_len, " HITM");
> + if (payload & BIT(EV_RECENTLY_FETCHED))
> + arm_spe_pkt_out_string(&err, &buf, &buf_len, " LFB");
> + if (payload & BIT(EV_DATA_SNOOPED))
> + arm_spe_pkt_out_string(&err, &buf, &buf_len, " SNOOPED");
> + if (payload & BIT(EV_STREAMING_SVE_MODE))
> + arm_spe_pkt_out_string(&err, &buf, &buf_len, " STREAMING-SVE");
> + if (payload & BIT(EV_SMCU))
> + arm_spe_pkt_out_string(&err, &buf, &buf_len, " SMCU");
>
> return err;
> }
> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
> index 2cdf9f6da2681244291445d54c5b13fe8a2e9d9a..d00c2481712dcc457eab2f5e9848ffc3150e6236 100644
> --- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
> @@ -108,6 +108,13 @@ enum arm_spe_events {
> EV_TRANSACTIONAL = 16,
> EV_PARTIAL_PREDICATE = 17,
> EV_EMPTY_PREDICATE = 18,
> + EV_L2D_ACCESS = 19,
> + EV_L2D_MISS = 20,
> + EV_CACHE_DATA_MODIFIED = 21,
> + EV_RECENTLY_FETCHED = 22,
> + EV_DATA_SNOOPED = 23,
> + EV_STREAMING_SVE_MODE = 24,
> + EV_SMCU = 25,
> };
>
> /* Operation packet header */
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 06/12] perf arm_spe: Add "events" entry in meta data
2025-06-13 15:53 ` [PATCH 06/12] perf arm_spe: Add "events" entry in meta data Leo Yan
@ 2025-06-19 15:46 ` James Clark
0 siblings, 0 replies; 28+ messages in thread
From: James Clark @ 2025-06-19 15:46 UTC (permalink / raw)
To: Leo Yan
Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
Will Deacon, Mark Rutland, Arnaldo Carvalho de Melo, Namhyung Kim,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter
On 13/06/2025 4:53 pm, Leo Yan wrote:
> Add a new "events" entry in the meta data and dump it in raw data mode.
>
> After:
>
> # perf script -D
> ...
>
> 0 0 0x470 [0x1f0]: PERF_RECORD_AUXTRACE_INFO type: 4
> Header version :2
> Header size :4
> PMU type v2 :11
> CPU number :8
> Magic :0x1010101010101010
> CPU # :0
> Num of params :4
> MIDR :0x410fd0f0
> PMU Type :11
> Min Interval :256
> Events :0xffff000003fefffe
>
Pending possibly renaming this field on patch 2
Reviewed-by: James Clark <james.clark@linaro.org>
> ...
>
> Signed-off-by: Leo Yan <leo.yan@arm.com>
> ---
> tools/perf/arch/arm64/util/arm-spe.c | 5 +++++
> tools/perf/util/arm-spe.c | 1 +
> tools/perf/util/arm-spe.h | 2 ++
> 3 files changed, 8 insertions(+)
>
> diff --git a/tools/perf/arch/arm64/util/arm-spe.c b/tools/perf/arch/arm64/util/arm-spe.c
> index 4f2833b62ff55f3fd1dff3f032d6e06528460939..5cce5f29d8c7936cc4f424a09d50536644c17be9 100644
> --- a/tools/perf/arch/arm64/util/arm-spe.c
> +++ b/tools/perf/arch/arm64/util/arm-spe.c
> @@ -121,12 +121,17 @@ static int arm_spe_save_cpu_header(struct auxtrace_record *itr,
> /* No Arm SPE PMU is found */
> data[ARM_SPE_CPU_PMU_TYPE] = ULLONG_MAX;
> data[ARM_SPE_CAP_MIN_IVAL] = 0;
> + data[ARM_SPE_CAP_EVENTS] = 0;
> } else {
> data[ARM_SPE_CPU_PMU_TYPE] = pmu->type;
>
> if (perf_pmu__scan_file(pmu, "caps/min_interval", "%lu", &val) != 1)
> val = 0;
> data[ARM_SPE_CAP_MIN_IVAL] = val;
> +
> + if (perf_pmu__scan_file(pmu, "caps/events", "%lu", &val) != 1)
> + val = 0;
> + data[ARM_SPE_CAP_EVENTS] = val;
> }
>
> free(cpuid);
> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
> index fdef6f743cf3c76b1dcdd57f5a2c297642fdd21a..55b8391990467c8b7818bb63de3545d94d021bb7 100644
> --- a/tools/perf/util/arm-spe.c
> +++ b/tools/perf/util/arm-spe.c
> @@ -1532,6 +1532,7 @@ static const char * const metadata_per_cpu_fmts[] = {
> [ARM_SPE_CPU_MIDR] = " MIDR :0x%"PRIx64"\n",
> [ARM_SPE_CPU_PMU_TYPE] = " PMU Type :%"PRId64"\n",
> [ARM_SPE_CAP_MIN_IVAL] = " Min Interval :%"PRId64"\n",
> + [ARM_SPE_CAP_EVENTS] = " Events :0x%"PRIx64"\n",
> };
>
> static void arm_spe_print_info(struct arm_spe *spe, __u64 *arr)
> diff --git a/tools/perf/util/arm-spe.h b/tools/perf/util/arm-spe.h
> index 390679a4af2fb61419bc881b5dc43c01f1dd77d7..a47d3d8fc97e07d1bb41a6776133d0676c335613 100644
> --- a/tools/perf/util/arm-spe.h
> +++ b/tools/perf/util/arm-spe.h
> @@ -47,6 +47,8 @@ enum {
> ARM_SPE_CPU_PMU_TYPE,
> /* Minimal interval */
> ARM_SPE_CAP_MIN_IVAL,
> + /* Supported events */
> + ARM_SPE_CAP_EVENTS,
> ARM_SPE_CPU_PRIV_MAX,
> };
>
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 01/12] drivers/perf: arm_spe: Store event reserved bits in driver data
2025-06-19 11:28 ` James Clark
@ 2025-06-19 16:22 ` Leo Yan
0 siblings, 0 replies; 28+ messages in thread
From: Leo Yan @ 2025-06-19 16:22 UTC (permalink / raw)
To: James Clark
Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
Will Deacon, Mark Rutland, Arnaldo Carvalho de Melo, Namhyung Kim,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter
On Thu, Jun 19, 2025 at 12:28:27PM +0100, James Clark wrote:
[...]
> > @@ -77,6 +77,7 @@ struct arm_spe_pmu {
> > u16 pmsver;
> > u16 min_period;
> > u16 counter_sz;
> > + u64 pmsevfr_res0;
>
> IMO this is a premature optimization and we shouldn't store things that are
> only transformations of some other already stored value. It becomes another
> thing to worry about when it's valid to read and to potentially keep it up
> to date etc.
>
> It's just one new one now, but eventually you end up with tons of them and
> then someone forgets to update a dependent one when the parent value changes
> and it makes a mess.
>
> And arm_spe_pmu_event_init() isn't even a particularly hot path.
Makes sense. I will drop this one.
Thanks for suggestions!
Leo
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 02/12] drivers/perf: arm_spe: Expose events capability
2025-06-19 11:32 ` James Clark
@ 2025-06-19 16:24 ` Leo Yan
0 siblings, 0 replies; 28+ messages in thread
From: Leo Yan @ 2025-06-19 16:24 UTC (permalink / raw)
To: James Clark
Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
Will Deacon, Mark Rutland, Arnaldo Carvalho de Melo, Namhyung Kim,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter
On Thu, Jun 19, 2025 at 12:32:16PM +0100, James Clark wrote:
[...]
> > @@ -165,6 +168,7 @@ static struct attribute *arm_spe_pmu_cap_attr[] = {
> > SPE_CAP_EXT_ATTR_ENTRY(ernd, SPE_PMU_CAP_ERND),
> > SPE_CAP_EXT_ATTR_ENTRY(count_size, SPE_PMU_CAP_CNT_SZ),
> > SPE_CAP_EXT_ATTR_ENTRY(min_interval, SPE_PMU_CAP_MIN_IVAL),
> > + SPE_CAP_EXT_ATTR_ENTRY(events, SPE_PMU_CAP_EVENTS),
>
> Would "event_filters" be a better name? Technically an SPE version can
> support a filter but the event itself might not be implemented.
Good point! Will update in next spin.
Thanks,
Leo
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 03/12] perf arm_spe: Correct setting remote access
2025-06-19 13:53 ` James Clark
@ 2025-06-19 16:45 ` Leo Yan
0 siblings, 0 replies; 28+ messages in thread
From: Leo Yan @ 2025-06-19 16:45 UTC (permalink / raw)
To: James Clark
Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
Will Deacon, Mark Rutland, Arnaldo Carvalho de Melo, Namhyung Kim,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter
On Thu, Jun 19, 2025 at 02:53:37PM +0100, James Clark wrote:
>
>
> On 13/06/2025 4:53 pm, Leo Yan wrote:
> > Set the mem_remote field for a remote access to appropriately represent
> > the event.
> >
> > Fixes: a89dbc9b988f ("perf arm-spe: Set sample's data source field")
> > Signed-off-by: Leo Yan <leo.yan@arm.com>
> > ---
> > tools/perf/util/arm-spe.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
> > index d46e0cccac99a36148b4daa37f2bf2342e6b47ef..fdef6f743cf3c76b1dcdd57f5a2c297642fdd21a 100644
> > --- a/tools/perf/util/arm-spe.c
> > +++ b/tools/perf/util/arm-spe.c
> > @@ -839,7 +839,7 @@ static void arm_spe__synth_memory_level(const struct arm_spe_record *record,
> > }
> > if (record->type & ARM_SPE_REMOTE_ACCESS)
> > - data_src->mem_lvl |= PERF_MEM_LVL_REM_CCE1;
> > + data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
>
> Should this not avoid overwriting mem_remote if it's already set by
> arm_spe__synth_ds()? We do that for mem_lvl above.
Here it does not overwrite to zeros if the 'mem_remote' field has been
set. It is harmless to overwrite 1 if it has been set.
I will add a condition for consistency.
> We also still set mem_lvl = PERF_MEM_LVL_REM_CCE1 in
> arm_spe__synth_data_source_common(), if it's not right it should probably be
> changed there too.
I will change to PERF_MEM_LVL_NA / PERF_MEM_LVLNUM_NA in
arm_spe__synth_data_source_common() for remote socket access.
Thanks!
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 07/12] perf arm_spe: Refine memory level filling
2025-06-13 15:53 ` [PATCH 07/12] perf arm_spe: Refine memory level filling Leo Yan
@ 2025-06-20 10:27 ` James Clark
0 siblings, 0 replies; 28+ messages in thread
From: James Clark @ 2025-06-20 10:27 UTC (permalink / raw)
To: Leo Yan
Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
Will Deacon, Mark Rutland, Arnaldo Carvalho de Melo, Namhyung Kim,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter
On 13/06/2025 4:53 pm, Leo Yan wrote:
> This commit introduces macros for detecting cache level and cache miss.
>
> Populates the 'mem_lvl_num' field which is a later added attribute for
> representing memory level. Set NA ("not available") to memory levels if
> memory hierarchy info is absent.
>
> Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
> ---
> tools/perf/util/arm-spe.c | 32 +++++++++++++++++++++-----------
> 1 file changed, 21 insertions(+), 11 deletions(-)
>
> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
> index 55b8391990467c8b7818bb63de3545d94d021bb7..b2296cd025382ea36820641164ec71b13a4e7a0e 100644
> --- a/tools/perf/util/arm-spe.c
> +++ b/tools/perf/util/arm-spe.c
> @@ -39,6 +39,15 @@
>
> #define is_ldst_op(op) (!!((op) & ARM_SPE_OP_LDST))
>
> +#define ARM_SPE_CACHE_EVENT(lvl) \
> + (ARM_SPE_##lvl##_ACCESS | ARM_SPE_##lvl##_MISS)
> +
> +#define arm_spe_is_cache_level(type, lvl) \
> + ((type) & ARM_SPE_CACHE_EVENT(lvl))
> +
> +#define arm_spe_is_cache_miss(type, lvl) \
> + ((type) & ARM_SPE_##lvl##_MISS)
> +
> struct arm_spe {
> struct auxtrace auxtrace;
> struct auxtrace_queues queues;
> @@ -822,20 +831,21 @@ static const struct data_source_handle data_source_handles[] = {
> static void arm_spe__synth_memory_level(const struct arm_spe_record *record,
> union perf_mem_data_src *data_src)
> {
> - if (record->type & (ARM_SPE_LLC_ACCESS | ARM_SPE_LLC_MISS)) {
> + if (arm_spe_is_cache_level(record->type, LLC)) {
> data_src->mem_lvl = PERF_MEM_LVL_L3;
> -
> - if (record->type & ARM_SPE_LLC_MISS)
> - data_src->mem_lvl |= PERF_MEM_LVL_MISS;
> - else
> - data_src->mem_lvl |= PERF_MEM_LVL_HIT;
> - } else if (record->type & (ARM_SPE_L1D_ACCESS | ARM_SPE_L1D_MISS)) {
> + data_src->mem_lvl |= arm_spe_is_cache_miss(record->type, LLC) ?
> + PERF_MEM_LVL_MISS : PERF_MEM_LVL_HIT;
> + data_src->mem_lvl_num = PERF_MEM_LVLNUM_L3;
> + } else if (arm_spe_is_cache_level(record->type, L1D)) {
> data_src->mem_lvl = PERF_MEM_LVL_L1;
> + data_src->mem_lvl |= arm_spe_is_cache_miss(record->type, L1D) ?
> + PERF_MEM_LVL_MISS : PERF_MEM_LVL_HIT;
> + data_src->mem_lvl_num = PERF_MEM_LVLNUM_L1;
> + }
>
> - if (record->type & ARM_SPE_L1D_MISS)
> - data_src->mem_lvl |= PERF_MEM_LVL_MISS;
> - else
> - data_src->mem_lvl |= PERF_MEM_LVL_HIT;
> + if (!data_src->mem_lvl) {
> + data_src->mem_lvl = PERF_MEM_LVL_NA;
> + data_src->mem_lvl_num = PERF_MEM_LVLNUM_NA;
> }
>
> if (record->type & ARM_SPE_REMOTE_ACCESS)
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 08/12] perf arm_spe: Separate setting of memory levels for loads and stores
2025-06-13 15:53 ` [PATCH 08/12] perf arm_spe: Separate setting of memory levels for loads and stores Leo Yan
@ 2025-06-20 10:30 ` James Clark
0 siblings, 0 replies; 28+ messages in thread
From: James Clark @ 2025-06-20 10:30 UTC (permalink / raw)
To: Leo Yan
Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
Will Deacon, Mark Rutland, Arnaldo Carvalho de Melo, Namhyung Kim,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter
On 13/06/2025 4:53 pm, Leo Yan wrote:
> For a load hit, the lowest-level cache reflects the latency of fetching
> a data. Otherwise, the highest-level cache involved in refilling
> indicates the overhead caused by a load.
>
> Store operations remain unchanged to keep the descending order when
> iterating through cache levels.
>
> Split into two functions: one is for setting memory levels for loads and
> another for stores.
>
> Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
> ---
> tools/perf/util/arm-spe.c | 45 +++++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 43 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
> index b2296cd025382ea36820641164ec71b13a4e7a0e..8f18af7336db53b00b450eb4299feee350d0ecb9 100644
> --- a/tools/perf/util/arm-spe.c
> +++ b/tools/perf/util/arm-spe.c
> @@ -45,6 +45,9 @@
> #define arm_spe_is_cache_level(type, lvl) \
> ((type) & ARM_SPE_CACHE_EVENT(lvl))
>
> +#define arm_spe_is_cache_hit(type, lvl) \
> + (((type) & ARM_SPE_CACHE_EVENT(lvl)) == ARM_SPE_##lvl##_ACCESS)
> +
> #define arm_spe_is_cache_miss(type, lvl) \
> ((type) & ARM_SPE_##lvl##_MISS)
>
> @@ -828,9 +831,38 @@ static const struct data_source_handle data_source_handles[] = {
> DS(hisi_hip_ds_encoding_cpus, data_source_hisi_hip),
> };
>
> -static void arm_spe__synth_memory_level(const struct arm_spe_record *record,
> - union perf_mem_data_src *data_src)
> +static void arm_spe__synth_ld_memory_level(const struct arm_spe_record *record,
> + union perf_mem_data_src *data_src)
> +{
> + /*
> + * To find a cache hit, search in ascending order from the lower level
> + * caches to the higher level caches. This reflects the best scenario
> + * for a cache hit.
> + */
> + if (arm_spe_is_cache_hit(record->type, L1D)) {
> + data_src->mem_lvl = PERF_MEM_LVL_L1 | PERF_MEM_LVL_HIT;
> + data_src->mem_lvl_num = PERF_MEM_LVLNUM_L1;
> + } else if (arm_spe_is_cache_hit(record->type, LLC)) {
> + data_src->mem_lvl = PERF_MEM_LVL_L3 | PERF_MEM_LVL_HIT;
> + data_src->mem_lvl_num = PERF_MEM_LVLNUM_L3;
> + /*
> + * To find a cache miss, search in descending order from the higher
> + * level cache to the lower level cache. This represents the worst
> + * scenario for a cache miss.
> + */
> + } else if (arm_spe_is_cache_miss(record->type, LLC)) {
> + data_src->mem_lvl = PERF_MEM_LVL_L3 | PERF_MEM_LVL_MISS;
> + data_src->mem_lvl_num = PERF_MEM_LVLNUM_L3;
> + } else if (arm_spe_is_cache_miss(record->type, L1D)) {
> + data_src->mem_lvl = PERF_MEM_LVL_L1 | PERF_MEM_LVL_MISS;
> + data_src->mem_lvl_num = PERF_MEM_LVLNUM_L1;
> + }
> +}
> +
> +static void arm_spe__synth_st_memory_level(const struct arm_spe_record *record,
> + union perf_mem_data_src *data_src)
> {
> + /* Record the greatest level info for a store operation. */
> if (arm_spe_is_cache_level(record->type, LLC)) {
> data_src->mem_lvl = PERF_MEM_LVL_L3;
> data_src->mem_lvl |= arm_spe_is_cache_miss(record->type, LLC) ?
> @@ -842,6 +874,15 @@ static void arm_spe__synth_memory_level(const struct arm_spe_record *record,
> PERF_MEM_LVL_MISS : PERF_MEM_LVL_HIT;
> data_src->mem_lvl_num = PERF_MEM_LVLNUM_L1;
> }
> +}
> +
> +static void arm_spe__synth_memory_level(const struct arm_spe_record *record,
> + union perf_mem_data_src *data_src)
> +{
> + if (data_src->mem_op == PERF_MEM_OP_LOAD)
> + arm_spe__synth_ld_memory_level(record, data_src);
> + if (data_src->mem_op == PERF_MEM_OP_STORE)
> + arm_spe__synth_st_memory_level(record, data_src);
>
> if (!data_src->mem_lvl) {
> data_src->mem_lvl = PERF_MEM_LVL_NA;
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 09/12] perf arm_spe: Fill memory levels for FEAT_SPEv1p4
2025-06-13 15:53 ` [PATCH 09/12] perf arm_spe: Fill memory levels for FEAT_SPEv1p4 Leo Yan
@ 2025-06-20 10:37 ` James Clark
0 siblings, 0 replies; 28+ messages in thread
From: James Clark @ 2025-06-20 10:37 UTC (permalink / raw)
To: Leo Yan
Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
Will Deacon, Mark Rutland, Arnaldo Carvalho de Melo, Namhyung Kim,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter
On 13/06/2025 4:53 pm, Leo Yan wrote:
> Starting with FEAT_SPEv1p4, Arm SPE provides information on Level 2 data
> cache and recently fetched events. This patch fills in the memory levels
> for these new events.
>
> The recently fetched events are matched to line-fill buffer (LFB). In
> general, the latency for accessing LFB is higher than accessing L1 cache
> but lower than accessing L2 cache. Thus, it locates in the memory
> hierarchy information between L1 cache and L2 cache.
>
> Signed-off-by: Leo Yan <leo.yan@arm.com>
> ---
> tools/perf/util/arm-spe-decoder/arm-spe-decoder.h | 3 +++
> tools/perf/util/arm-spe.c | 14 ++++++++++++++
> 2 files changed, 17 insertions(+)
>
> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
> index 03da55453da8fd2e7b9e2dcba3ddcf5243599e1c..90c76928c7bf1b35cec538abdb0e88d6083fe81b 100644
> --- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
> @@ -25,6 +25,9 @@
> #define ARM_SPE_SVE_PARTIAL_PRED BIT(EV_PARTIAL_PREDICATE)
> #define ARM_SPE_SVE_EMPTY_PRED BIT(EV_EMPTY_PREDICATE)
> #define ARM_SPE_IN_TXN BIT(EV_TRANSACTIONAL)
> +#define ARM_SPE_L2D_ACCESS BIT(EV_L2D_ACCESS)
> +#define ARM_SPE_L2D_MISS BIT(EV_L2D_MISS)
> +#define ARM_SPE_RECENTLY_FETCH BIT(EV_RECENTLY_FETCHED)
FETCH -> FETCHED
Reviewed-by: James Clark <james.clark@linaro.org>
>
> enum arm_spe_op_type {
> /* First level operation type */
> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
> index 8f18af7336db53b00b450eb4299feee350d0ecb9..2ab38d21d52f73617451a6a79f9d5ae931a34f49 100644
> --- a/tools/perf/util/arm-spe.c
> +++ b/tools/perf/util/arm-spe.c
> @@ -842,6 +842,12 @@ static void arm_spe__synth_ld_memory_level(const struct arm_spe_record *record,
> if (arm_spe_is_cache_hit(record->type, L1D)) {
> data_src->mem_lvl = PERF_MEM_LVL_L1 | PERF_MEM_LVL_HIT;
> data_src->mem_lvl_num = PERF_MEM_LVLNUM_L1;
> + } else if (record->type & ARM_SPE_RECENTLY_FETCH) {
> + data_src->mem_lvl = PERF_MEM_LVL_LFB | PERF_MEM_LVL_HIT;
> + data_src->mem_lvl_num = PERF_MEM_LVLNUM_LFB;
> + } else if (arm_spe_is_cache_hit(record->type, L2D)) {
> + data_src->mem_lvl = PERF_MEM_LVL_L2 | PERF_MEM_LVL_HIT;
> + data_src->mem_lvl_num = PERF_MEM_LVLNUM_L2;
> } else if (arm_spe_is_cache_hit(record->type, LLC)) {
> data_src->mem_lvl = PERF_MEM_LVL_L3 | PERF_MEM_LVL_HIT;
> data_src->mem_lvl_num = PERF_MEM_LVLNUM_L3;
> @@ -853,6 +859,9 @@ static void arm_spe__synth_ld_memory_level(const struct arm_spe_record *record,
> } else if (arm_spe_is_cache_miss(record->type, LLC)) {
> data_src->mem_lvl = PERF_MEM_LVL_L3 | PERF_MEM_LVL_MISS;
> data_src->mem_lvl_num = PERF_MEM_LVLNUM_L3;
> + } else if (arm_spe_is_cache_miss(record->type, L2D)) {
> + data_src->mem_lvl = PERF_MEM_LVL_L2 | PERF_MEM_LVL_MISS;
> + data_src->mem_lvl_num = PERF_MEM_LVLNUM_L2;
> } else if (arm_spe_is_cache_miss(record->type, L1D)) {
> data_src->mem_lvl = PERF_MEM_LVL_L1 | PERF_MEM_LVL_MISS;
> data_src->mem_lvl_num = PERF_MEM_LVLNUM_L1;
> @@ -868,6 +877,11 @@ static void arm_spe__synth_st_memory_level(const struct arm_spe_record *record,
> data_src->mem_lvl |= arm_spe_is_cache_miss(record->type, LLC) ?
> PERF_MEM_LVL_MISS : PERF_MEM_LVL_HIT;
> data_src->mem_lvl_num = PERF_MEM_LVLNUM_L3;
> + } else if (arm_spe_is_cache_level(record->type, L2D)) {
> + data_src->mem_lvl = PERF_MEM_LVL_L2;
> + data_src->mem_lvl |= arm_spe_is_cache_miss(record->type, L2D) ?
> + PERF_MEM_LVL_MISS : PERF_MEM_LVL_HIT;
> + data_src->mem_lvl_num = PERF_MEM_LVLNUM_L2;
> } else if (arm_spe_is_cache_level(record->type, L1D)) {
> data_src->mem_lvl = PERF_MEM_LVL_L1;
> data_src->mem_lvl |= arm_spe_is_cache_miss(record->type, L1D) ?
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 10/12] perf arm_spe: Refactor arm_spe__get_metadata_by_cpu()
2025-06-13 15:53 ` [PATCH 10/12] perf arm_spe: Refactor arm_spe__get_metadata_by_cpu() Leo Yan
@ 2025-06-20 10:45 ` James Clark
0 siblings, 0 replies; 28+ messages in thread
From: James Clark @ 2025-06-20 10:45 UTC (permalink / raw)
To: Leo Yan
Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
Will Deacon, Mark Rutland, Arnaldo Carvalho de Melo, Namhyung Kim,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter
On 13/06/2025 4:53 pm, Leo Yan wrote:
> Handle "CPU=-1" (per-thread mode) in the arm_spe__get_metadata_by_cpu()
> function. As a result, the function is more general and will be invoked
> by a sequential change.
>
> Signed-off-by: Leo Yan <leo.yan@arm.com>
> ---
> tools/perf/util/arm-spe.c | 30 ++++++++++++++----------------
> 1 file changed, 14 insertions(+), 16 deletions(-)
>
> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
> index 2ab38d21d52f73617451a6a79f9d5ae931a34f49..8e93b0d151a98714d0c5e5f6ceec386a2aa63ad0 100644
> --- a/tools/perf/util/arm-spe.c
> +++ b/tools/perf/util/arm-spe.c
> @@ -324,6 +324,19 @@ static u64 *arm_spe__get_metadata_by_cpu(struct arm_spe *spe, u64 cpu)
> if (!spe->metadata)
> return NULL;
>
> + /* CPU ID is -1 for per-thread mode */
> + if (cpu < 0) {
> + /*
> + * On the heterogeneous system, due to CPU ID is -1,
> + * cannot confirm the meta data.
> + */
> + if (!spe->is_homogeneous)
> + return NULL;
> +
> + /* In homogeneous system, simply use CPU0's metadata */
> + return spe->metadata[0];
> + }
> +
> for (i = 0; i < spe->metadata_nr_cpu; i++)
> if (spe->metadata[i][ARM_SPE_CPU] == cpu)
> return spe->metadata[i];
> @@ -924,22 +937,7 @@ static bool arm_spe__synth_ds(struct arm_spe_queue *speq,
> cpuid = perf_env__cpuid(spe->session->evlist->env);
> midr = strtol(cpuid, NULL, 16);
> } else {
> - /* CPU ID is -1 for per-thread mode */
> - if (speq->cpu < 0) {
> - /*
> - * On the heterogeneous system, due to CPU ID is -1,
> - * cannot confirm the data source packet is supported.
> - */
> - if (!spe->is_homogeneous)
> - return false;
> -
> - /* In homogeneous system, simply use CPU0's metadata */
> - if (spe->metadata)
> - metadata = spe->metadata[0];
> - } else {
> - metadata = arm_spe__get_metadata_by_cpu(spe, speq->cpu);
> - }
> -
> + metadata = arm_spe__get_metadata_by_cpu(spe, speq->cpu);
> if (!metadata)
> return false;
>
>
Reviewed-by: James Clark <james.clark@linaro.org>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 11/12] perf arm_spe: Set HITM flag
2025-06-13 15:53 ` [PATCH 11/12] perf arm_spe: Set HITM flag Leo Yan
@ 2025-06-20 10:51 ` James Clark
0 siblings, 0 replies; 28+ messages in thread
From: James Clark @ 2025-06-20 10:51 UTC (permalink / raw)
To: Leo Yan
Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
Will Deacon, Mark Rutland, Arnaldo Carvalho de Melo, Namhyung Kim,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter
On 13/06/2025 4:53 pm, Leo Yan wrote:
> Since FEAT_SPEv1p4, Arm SPE provides extra two events "Cache data
> modified" and "Data snooped".
>
> Set the snoop mode as:
>
> - If both the "Cache data modified" event and the "Data snooped" event
> are set, which indicates a load operation that snooped from a outside
> cache and hit a modified copy, set the HITM flag to inspect false
> sharing.
> - If the snooped event bit is not set, and the snooped event has been
> supported by the hardware, set as NONE mode (no snoop operation).
> - If the snooped event bit is not set, and the event is not supported or
> absent the events info in the meta data, set as NA mode (not
> available).
>
> Don't set any mode for only "Cache data modified" event, as it hits a
> local modified copy.
>
Reviewed-by: James Clark <james.clark@linaro.org>
> Signed-off-by: Leo Yan <leo.yan@arm.com>
> ---
> tools/perf/util/arm-spe-decoder/arm-spe-decoder.h | 2 ++
> tools/perf/util/arm-spe.c | 26 +++++++++++++++++++++--
> 2 files changed, 26 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
> index 90c76928c7bf1b35cec538abdb0e88d6083fe81b..a2b48b0c87712f232587023eeaa66a9b83aed382 100644
> --- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
> @@ -28,6 +28,8 @@
> #define ARM_SPE_L2D_ACCESS BIT(EV_L2D_ACCESS)
> #define ARM_SPE_L2D_MISS BIT(EV_L2D_MISS)
> #define ARM_SPE_RECENTLY_FETCH BIT(EV_RECENTLY_FETCHED)
> +#define ARM_SPE_DATA_SNOOPED BIT(EV_DATA_SNOOPED)
> +#define ARM_SPE_HITM BIT(EV_CACHE_DATA_MODIFIED)
>
> enum arm_spe_op_type {
> /* First level operation type */
> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
> index 8e93b0d151a98714d0c5e5f6ceec386a2aa63ad0..8a889f727f9cd5351b4ca027935112eddd16ea6c 100644
> --- a/tools/perf/util/arm-spe.c
> +++ b/tools/perf/util/arm-spe.c
> @@ -903,9 +903,12 @@ static void arm_spe__synth_st_memory_level(const struct arm_spe_record *record,
> }
> }
>
> -static void arm_spe__synth_memory_level(const struct arm_spe_record *record,
> +static void arm_spe__synth_memory_level(struct arm_spe_queue *speq,
> + const struct arm_spe_record *record,
> union perf_mem_data_src *data_src)
> {
> + struct arm_spe *spe = speq->spe;
> +
> if (data_src->mem_op == PERF_MEM_OP_LOAD)
> arm_spe__synth_ld_memory_level(record, data_src);
> if (data_src->mem_op == PERF_MEM_OP_STORE)
> @@ -916,6 +919,25 @@ static void arm_spe__synth_memory_level(const struct arm_spe_record *record,
> data_src->mem_lvl_num = PERF_MEM_LVLNUM_NA;
> }
>
> + if (record->type & ARM_SPE_DATA_SNOOPED) {
> + if (record->type & ARM_SPE_HITM)
> + data_src->mem_snoop = PERF_MEM_SNOOP_HITM;
> + else
> + data_src->mem_snoop = PERF_MEM_SNOOP_HIT;
> + } else {
> + u64 *metadata = arm_spe__get_metadata_by_cpu(spe, speq->cpu);
> +
> + /*
> + * Set NA ("Not available") mode if no meta data or the
> + * SNOOPED event is not supported.
> + */
> + if (!metadata ||
> + !(metadata[ARM_SPE_CAP_EVENTS] & ARM_SPE_DATA_SNOOPED))
> + data_src->mem_snoop = PERF_MEM_SNOOP_NA;
> + else
> + data_src->mem_snoop = PERF_MEM_SNOOP_NONE;
> + }
> +
> if (record->type & ARM_SPE_REMOTE_ACCESS)
> data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
> }
> @@ -971,7 +993,7 @@ static u64 arm_spe__synth_data_source(struct arm_spe_queue *speq,
> return 0;
>
> if (!arm_spe__synth_ds(speq, record, &data_src))
> - arm_spe__synth_memory_level(record, &data_src);
> + arm_spe__synth_memory_level(speq, record, &data_src);
>
> if (record->type & (ARM_SPE_TLB_ACCESS | ARM_SPE_TLB_MISS)) {
> data_src.mem_dtlb = PERF_MEM_TLB_WK;
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 12/12] perf arm_spe: Allow parsing both data source and events
2025-06-13 15:53 ` [PATCH 12/12] perf arm_spe: Allow parsing both data source and events Leo Yan
@ 2025-06-20 10:55 ` James Clark
0 siblings, 0 replies; 28+ messages in thread
From: James Clark @ 2025-06-20 10:55 UTC (permalink / raw)
To: Leo Yan
Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
Will Deacon, Mark Rutland, Arnaldo Carvalho de Melo, Namhyung Kim,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter
On 13/06/2025 4:53 pm, Leo Yan wrote:
> Current code skips to parse events after generating data source. The
> reason is the data source packets have cache and snooping related info,
> the afterwards event packets might contain duplicate info.
>
> This commit changes to continue parsing the events after data source
> analysis. If data source does not give out memory level and snooping
> types, then the event info is used to synthesize the related fields.
>
> As a result, both the peer snoop option ('-d peer') and hitm options
> ('-d tot/lcl/rmt') are supported by Arm SPE in the perf c2c.
>
Reviewed-by: James Clark <james.clark@linaro.org>
> Signed-off-by: Leo Yan <leo.yan@arm.com>
> ---
> tools/perf/util/arm-spe.c | 69 ++++++++++++++++++++++++++++-------------------
> 1 file changed, 41 insertions(+), 28 deletions(-)
>
> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
> index 8a889f727f9cd5351b4ca027935112eddd16ea6c..8fde6f6cbce92aabf20d25b01ee2ade4aae7ea61 100644
> --- a/tools/perf/util/arm-spe.c
> +++ b/tools/perf/util/arm-spe.c
> @@ -909,40 +909,54 @@ static void arm_spe__synth_memory_level(struct arm_spe_queue *speq,
> {
> struct arm_spe *spe = speq->spe;
>
> - if (data_src->mem_op == PERF_MEM_OP_LOAD)
> - arm_spe__synth_ld_memory_level(record, data_src);
> - if (data_src->mem_op == PERF_MEM_OP_STORE)
> - arm_spe__synth_st_memory_level(record, data_src);
> + /*
> + * The data source packet contains more info for cache levels for
> + * peer snooping. So respect the memory level if has been set by
> + * data source parsing.
> + */
> + if (!data_src->mem_lvl) {
> + if (data_src->mem_op == PERF_MEM_OP_LOAD)
> + arm_spe__synth_ld_memory_level(record, data_src);
> + if (data_src->mem_op == PERF_MEM_OP_STORE)
> + arm_spe__synth_st_memory_level(record, data_src);
> + }
>
> if (!data_src->mem_lvl) {
> data_src->mem_lvl = PERF_MEM_LVL_NA;
> data_src->mem_lvl_num = PERF_MEM_LVLNUM_NA;
> }
>
> - if (record->type & ARM_SPE_DATA_SNOOPED) {
> - if (record->type & ARM_SPE_HITM)
> - data_src->mem_snoop = PERF_MEM_SNOOP_HITM;
> - else
> - data_src->mem_snoop = PERF_MEM_SNOOP_HIT;
> - } else {
> - u64 *metadata = arm_spe__get_metadata_by_cpu(spe, speq->cpu);
> -
> - /*
> - * Set NA ("Not available") mode if no meta data or the
> - * SNOOPED event is not supported.
> - */
> - if (!metadata ||
> - !(metadata[ARM_SPE_CAP_EVENTS] & ARM_SPE_DATA_SNOOPED))
> - data_src->mem_snoop = PERF_MEM_SNOOP_NA;
> - else
> - data_src->mem_snoop = PERF_MEM_SNOOP_NONE;
> + /*
> + * If 'mem_snoop' has been set by data source packet, skip to set
> + * it at here.
> + */
> + if (!data_src->mem_snoop) {
> + if (record->type & ARM_SPE_DATA_SNOOPED) {
> + if (record->type & ARM_SPE_HITM)
> + data_src->mem_snoop = PERF_MEM_SNOOP_HITM;
> + else
> + data_src->mem_snoop = PERF_MEM_SNOOP_HIT;
> + } else {
> + u64 *metadata =
> + arm_spe__get_metadata_by_cpu(spe, speq->cpu);
> +
> + /*
> + * Set NA ("Not available") mode if no meta data or the
> + * SNOOPED event is not supported.
> + */
> + if (!metadata ||
> + !(metadata[ARM_SPE_CAP_EVENTS] & ARM_SPE_DATA_SNOOPED))
> + data_src->mem_snoop = PERF_MEM_SNOOP_NA;
> + else
> + data_src->mem_snoop = PERF_MEM_SNOOP_NONE;
> + }
> }
>
> if (record->type & ARM_SPE_REMOTE_ACCESS)
> data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
> }
>
> -static bool arm_spe__synth_ds(struct arm_spe_queue *speq,
> +static void arm_spe__synth_ds(struct arm_spe_queue *speq,
> const struct arm_spe_record *record,
> union perf_mem_data_src *data_src)
> {
> @@ -961,19 +975,18 @@ static bool arm_spe__synth_ds(struct arm_spe_queue *speq,
> } else {
> metadata = arm_spe__get_metadata_by_cpu(spe, speq->cpu);
> if (!metadata)
> - return false;
> + return;
>
> midr = metadata[ARM_SPE_CPU_MIDR];
> }
>
> for (i = 0; i < ARRAY_SIZE(data_source_handles); i++) {
> if (is_midr_in_range_list(midr, data_source_handles[i].midr_ranges)) {
> - data_source_handles[i].ds_synth(record, data_src);
> - return true;
> + return data_source_handles[i].ds_synth(record, data_src);
> }
> }
>
> - return false;
> + return;
> }
>
> static u64 arm_spe__synth_data_source(struct arm_spe_queue *speq,
> @@ -992,8 +1005,8 @@ static u64 arm_spe__synth_data_source(struct arm_spe_queue *speq,
> else
> return 0;
>
> - if (!arm_spe__synth_ds(speq, record, &data_src))
> - arm_spe__synth_memory_level(speq, record, &data_src);
> + arm_spe__synth_ds(speq, record, &data_src);
> + arm_spe__synth_memory_level(speq, record, &data_src);
>
> if (record->type & (ARM_SPE_TLB_ACCESS | ARM_SPE_TLB_MISS)) {
> data_src.mem_dtlb = PERF_MEM_TLB_WK;
>
^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2025-06-20 11:43 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-13 15:53 [PATCH 00/12] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
2025-06-13 15:53 ` [PATCH 01/12] drivers/perf: arm_spe: Store event reserved bits in driver data Leo Yan
2025-06-19 11:28 ` James Clark
2025-06-19 16:22 ` Leo Yan
2025-06-13 15:53 ` [PATCH 02/12] drivers/perf: arm_spe: Expose events capability Leo Yan
2025-06-19 11:32 ` James Clark
2025-06-19 16:24 ` Leo Yan
2025-06-13 15:53 ` [PATCH 03/12] perf arm_spe: Correct setting remote access Leo Yan
2025-06-19 13:53 ` James Clark
2025-06-19 16:45 ` Leo Yan
2025-06-13 15:53 ` [PATCH 04/12] perf arm_spe: Directly propagate raw event Leo Yan
2025-06-19 14:13 ` James Clark
2025-06-13 15:53 ` [PATCH 05/12] perf arm_spe: Decode event types for new features Leo Yan
2025-06-19 14:20 ` James Clark
2025-06-13 15:53 ` [PATCH 06/12] perf arm_spe: Add "events" entry in meta data Leo Yan
2025-06-19 15:46 ` James Clark
2025-06-13 15:53 ` [PATCH 07/12] perf arm_spe: Refine memory level filling Leo Yan
2025-06-20 10:27 ` James Clark
2025-06-13 15:53 ` [PATCH 08/12] perf arm_spe: Separate setting of memory levels for loads and stores Leo Yan
2025-06-20 10:30 ` James Clark
2025-06-13 15:53 ` [PATCH 09/12] perf arm_spe: Fill memory levels for FEAT_SPEv1p4 Leo Yan
2025-06-20 10:37 ` James Clark
2025-06-13 15:53 ` [PATCH 10/12] perf arm_spe: Refactor arm_spe__get_metadata_by_cpu() Leo Yan
2025-06-20 10:45 ` James Clark
2025-06-13 15:53 ` [PATCH 11/12] perf arm_spe: Set HITM flag Leo Yan
2025-06-20 10:51 ` James Clark
2025-06-13 15:53 ` [PATCH 12/12] perf arm_spe: Allow parsing both data source and events Leo Yan
2025-06-20 10:55 ` James Clark
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).