* [PATCH v3 00/14] perf arm-spe: Support new events in FEAT_SPEv1p4
@ 2025-07-07 13:39 Leo Yan
2025-07-07 13:39 ` [PATCH v3 01/14] drivers/perf: arm_spe: Expose event filter Leo Yan
` (13 more replies)
0 siblings, 14 replies; 21+ messages in thread
From: Leo Yan @ 2025-07-07 13:39 UTC (permalink / raw)
To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, German Gomez, Ali Saidi
Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
Leo Yan
This series adds support for new event types introduced in Arm SPE v1.4.
The first patch modifies the Arm SPE driver to expose 'event_filter'
entry in SysFS caps folder. This allows users to discover the event
filter is supported by the hardware.
Patch 02 is a fixing for setting remote bit. Patch 03 is for fixing
memory level info for remote access.
Patch 04 is a refactoring for using full type for data_src.
Patch 05 refactors the code to avoid duplicate definitions of event
bits.
Patch 06 dumps new event bits in raw format via the 'perf script -D'
command.
Patches 07 to 13 enhance memory-level information based on the new
events introduced in FEAT_SPEv1p4.
Patch 14 changes the logic to parse events after data source analysis.
The event information complements the data source and provides a more
complete view. As a result, Arm SPE can now support both HITM and peer
modes (See the "--display" options in perf c2c).
This series has been tested on FVP RevC platform.
Note: for a local HITM event, the emulation does not provide any info
for LLC. However, the perf c2c tool relies on LLC + HITM for accounting
local HITM. I to manually set the LLC HIT flag to verify the
"perf c2c -d tot" command.
---
Changes in v3:
- Retrieve CPU number from PMU type (Ian).
- Link to v2: https://lore.kernel.org/r/20250630-arm_spe_support_hitm_overhead_v1_public-v2-0-2e1afab313b9@arm.com
Changes in v2:
- Dropped the kernel change for caching "pmsevfr_res0" (James)
- Renamed the "events" entry to "event_filter" (James)
- Added a new refactoring patch 04 (James)
- Updated memory level info for remote access (James)
- Link to v1: https://lore.kernel.org/r/20250613-arm_spe_support_hitm_overhead_v1_public-v1-0-6faecf0a8775@arm.com
---
James Clark (1):
perf arm_spe: Use full type for data_src
Leo Yan (13):
drivers/perf: arm_spe: Expose event filter
perf arm_spe: Correct setting remote access
perf arm_spe: Correct memory level for remote access
perf arm_spe: Directly propagate raw event
perf arm_spe: Decode event types for new features
perf arm_spe: Add "event_filter" entry in meta data
perf arm_spe: Refine memory level filling
perf arm_spe: Separate setting of memory levels for loads and stores
perf arm_spe: Fill memory levels for FEAT_SPEv1p4
perf arm_spe: Improve CPU number retrieving in per-thread mode
perf arm_spe: Refactor arm_spe__get_metadata_by_cpu()
perf arm_spe: Set HITM flag
perf arm_spe: Allow parsing both data source and events
drivers/perf/arm_spe_pmu.c | 36 ++--
tools/perf/arch/arm64/util/arm-spe.c | 5 +
tools/perf/util/arm-spe-decoder/arm-spe-decoder.c | 37 +---
tools/perf/util/arm-spe-decoder/arm-spe-decoder.h | 33 ++--
.../util/arm-spe-decoder/arm-spe-pkt-decoder.c | 14 ++
.../util/arm-spe-decoder/arm-spe-pkt-decoder.h | 7 +
tools/perf/util/arm-spe.c | 220 ++++++++++++++++-----
tools/perf/util/arm-spe.h | 2 +
8 files changed, 234 insertions(+), 120 deletions(-)
---
base-commit: d7b8f8e20813f0179d8ef519541a3527e7661d3a
change-id: 20250610-arm_spe_support_hitm_overhead_v1_public-c4a263385434
Best regards,
--
Leo Yan <leo.yan@arm.com>
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v3 01/14] drivers/perf: arm_spe: Expose event filter
2025-07-07 13:39 [PATCH v3 00/14] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
@ 2025-07-07 13:39 ` Leo Yan
2025-07-14 13:07 ` Will Deacon
2025-07-07 13:39 ` [PATCH v3 02/14] perf arm_spe: Correct setting remote access Leo Yan
` (12 subsequent siblings)
13 siblings, 1 reply; 21+ messages in thread
From: Leo Yan @ 2025-07-07 13:39 UTC (permalink / raw)
To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, German Gomez, Ali Saidi
Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
Leo Yan
Expose an "event_filter" entry in the caps folder to inform user space
about which events can be filtered.
Change the return type of arm_spe_pmu_cap_get() from u32 to u64 to
accommodate the added event filter entry.
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
drivers/perf/arm_spe_pmu.c | 36 ++++++++++++++++++++----------------
1 file changed, 20 insertions(+), 16 deletions(-)
diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index 3efed8839a4ec5604eba242cb620327cd2a6a87d..78d8cb59b66d7bc6319eb4ee40e6d2d32ffb8bdf 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -115,6 +115,7 @@ enum arm_spe_pmu_capabilities {
SPE_PMU_CAP_FEAT_MAX,
SPE_PMU_CAP_CNT_SZ = SPE_PMU_CAP_FEAT_MAX,
SPE_PMU_CAP_MIN_IVAL,
+ SPE_PMU_CAP_EVENT_FILTER,
};
static int arm_spe_pmu_feat_caps[SPE_PMU_CAP_FEAT_MAX] = {
@@ -122,7 +123,21 @@ static int arm_spe_pmu_feat_caps[SPE_PMU_CAP_FEAT_MAX] = {
[SPE_PMU_CAP_ERND] = SPE_PMU_FEAT_ERND,
};
-static u32 arm_spe_pmu_cap_get(struct arm_spe_pmu *spe_pmu, int cap)
+static u64 arm_spe_pmsevfr_res0(u16 pmsver)
+{
+ switch (pmsver) {
+ case ID_AA64DFR0_EL1_PMSVer_IMP:
+ return PMSEVFR_EL1_RES0_IMP;
+ case ID_AA64DFR0_EL1_PMSVer_V1P1:
+ return PMSEVFR_EL1_RES0_V1P1;
+ case ID_AA64DFR0_EL1_PMSVer_V1P2:
+ /* Return the highest version we support in default */
+ default:
+ return PMSEVFR_EL1_RES0_V1P2;
+ }
+}
+
+static u64 arm_spe_pmu_cap_get(struct arm_spe_pmu *spe_pmu, int cap)
{
if (cap < SPE_PMU_CAP_FEAT_MAX)
return !!(spe_pmu->features & arm_spe_pmu_feat_caps[cap]);
@@ -132,6 +147,8 @@ static u32 arm_spe_pmu_cap_get(struct arm_spe_pmu *spe_pmu, int cap)
return spe_pmu->counter_sz;
case SPE_PMU_CAP_MIN_IVAL:
return spe_pmu->min_period;
+ case SPE_PMU_CAP_EVENT_FILTER:
+ return ~arm_spe_pmsevfr_res0(spe_pmu->pmsver);
default:
WARN(1, "unknown cap %d\n", cap);
}
@@ -148,7 +165,7 @@ static ssize_t arm_spe_pmu_cap_show(struct device *dev,
container_of(attr, struct dev_ext_attribute, attr);
int cap = (long)ea->var;
- return sysfs_emit(buf, "%u\n", arm_spe_pmu_cap_get(spe_pmu, cap));
+ return sysfs_emit(buf, "%llu\n", arm_spe_pmu_cap_get(spe_pmu, cap));
}
#define SPE_EXT_ATTR_ENTRY(_name, _func, _var) \
@@ -164,6 +181,7 @@ static struct attribute *arm_spe_pmu_cap_attr[] = {
SPE_CAP_EXT_ATTR_ENTRY(ernd, SPE_PMU_CAP_ERND),
SPE_CAP_EXT_ATTR_ENTRY(count_size, SPE_PMU_CAP_CNT_SZ),
SPE_CAP_EXT_ATTR_ENTRY(min_interval, SPE_PMU_CAP_MIN_IVAL),
+ SPE_CAP_EXT_ATTR_ENTRY(event_filter, SPE_PMU_CAP_EVENT_FILTER),
NULL,
};
@@ -693,20 +711,6 @@ static irqreturn_t arm_spe_pmu_irq_handler(int irq, void *dev)
return IRQ_HANDLED;
}
-static u64 arm_spe_pmsevfr_res0(u16 pmsver)
-{
- switch (pmsver) {
- case ID_AA64DFR0_EL1_PMSVer_IMP:
- return PMSEVFR_EL1_RES0_IMP;
- case ID_AA64DFR0_EL1_PMSVer_V1P1:
- return PMSEVFR_EL1_RES0_V1P1;
- case ID_AA64DFR0_EL1_PMSVer_V1P2:
- /* Return the highest version we support in default */
- default:
- return PMSEVFR_EL1_RES0_V1P2;
- }
-}
-
/* Perf callbacks */
static int arm_spe_pmu_event_init(struct perf_event *event)
{
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 02/14] perf arm_spe: Correct setting remote access
2025-07-07 13:39 [PATCH v3 00/14] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
2025-07-07 13:39 ` [PATCH v3 01/14] drivers/perf: arm_spe: Expose event filter Leo Yan
@ 2025-07-07 13:39 ` Leo Yan
2025-07-07 13:39 ` [PATCH v3 03/14] perf arm_spe: Correct memory level for " Leo Yan
` (11 subsequent siblings)
13 siblings, 0 replies; 21+ messages in thread
From: Leo Yan @ 2025-07-07 13:39 UTC (permalink / raw)
To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, German Gomez, Ali Saidi
Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
Leo Yan
Set the mem_remote field for a remote access to appropriately represent
the event.
Fixes: a89dbc9b988f ("perf arm-spe: Set sample's data source field")
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
tools/perf/util/arm-spe.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index d46e0cccac99a36148b4daa37f2bf2342e6b47ef..fdef6f743cf3c76b1dcdd57f5a2c297642fdd21a 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -839,7 +839,7 @@ static void arm_spe__synth_memory_level(const struct arm_spe_record *record,
}
if (record->type & ARM_SPE_REMOTE_ACCESS)
- data_src->mem_lvl |= PERF_MEM_LVL_REM_CCE1;
+ data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
}
static bool arm_spe__synth_ds(struct arm_spe_queue *speq,
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 03/14] perf arm_spe: Correct memory level for remote access
2025-07-07 13:39 [PATCH v3 00/14] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
2025-07-07 13:39 ` [PATCH v3 01/14] drivers/perf: arm_spe: Expose event filter Leo Yan
2025-07-07 13:39 ` [PATCH v3 02/14] perf arm_spe: Correct setting remote access Leo Yan
@ 2025-07-07 13:39 ` Leo Yan
2025-07-07 13:39 ` [PATCH v3 04/14] perf arm_spe: Use full type for data_src Leo Yan
` (10 subsequent siblings)
13 siblings, 0 replies; 21+ messages in thread
From: Leo Yan @ 2025-07-07 13:39 UTC (permalink / raw)
To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, German Gomez, Ali Saidi
Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
Leo Yan
For remote accesses, the data source packet does not contain information
about the memory level. To avoid misinformation, set the memory level to
NA (Not Available).
Fixes: 4e6430cbb1a9 ("perf arm-spe: Use SPE data source for neoverse cores")
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
tools/perf/util/arm-spe.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index fdef6f743cf3c76b1dcdd57f5a2c297642fdd21a..182e2c604ea49790a2f5341304ef1cd8217ea6a3 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -670,8 +670,8 @@ static void arm_spe__synth_data_source_common(const struct arm_spe_record *recor
* socket
*/
case ARM_SPE_COMMON_DS_REMOTE:
- data_src->mem_lvl = PERF_MEM_LVL_REM_CCE1;
- data_src->mem_lvl_num = PERF_MEM_LVLNUM_ANY_CACHE;
+ data_src->mem_lvl = PERF_MEM_LVL_NA;
+ data_src->mem_lvl_num = PERF_MEM_LVLNUM_NA;
data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
data_src->mem_snoopx = PERF_MEM_SNOOPX_PEER;
break;
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 04/14] perf arm_spe: Use full type for data_src
2025-07-07 13:39 [PATCH v3 00/14] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
` (2 preceding siblings ...)
2025-07-07 13:39 ` [PATCH v3 03/14] perf arm_spe: Correct memory level for " Leo Yan
@ 2025-07-07 13:39 ` Leo Yan
2025-07-07 13:39 ` [PATCH v3 05/14] perf arm_spe: Directly propagate raw event Leo Yan
` (9 subsequent siblings)
13 siblings, 0 replies; 21+ messages in thread
From: Leo Yan @ 2025-07-07 13:39 UTC (permalink / raw)
To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, German Gomez, Ali Saidi
Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
Leo Yan
From: James Clark <james.clark@linaro.org>
data_src has an actual type rather than just being a u64. To help
readers, delay decomposing it to a u64 until it's finally assigned to
the sample.
Signed-off-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
tools/perf/util/arm-spe.c | 25 ++++++++++++++-----------
1 file changed, 14 insertions(+), 11 deletions(-)
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 182e2c604ea49790a2f5341304ef1cd8217ea6a3..fec11322690ec156500335c86812f55b1bcfb4bb 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -471,7 +471,8 @@ arm_spe_deliver_synth_event(struct arm_spe *spe,
}
static int arm_spe__synth_mem_sample(struct arm_spe_queue *speq,
- u64 spe_events_id, u64 data_src)
+ u64 spe_events_id,
+ union perf_mem_data_src data_src)
{
struct arm_spe *spe = speq->spe;
struct arm_spe_record *record = &speq->decoder->record;
@@ -486,7 +487,7 @@ static int arm_spe__synth_mem_sample(struct arm_spe_queue *speq,
sample.stream_id = spe_events_id;
sample.addr = record->virt_addr;
sample.phys_addr = record->phys_addr;
- sample.data_src = data_src;
+ sample.data_src = data_src.val;
sample.weight = record->latency;
ret = arm_spe_deliver_synth_event(spe, speq, event, &sample);
@@ -519,7 +520,8 @@ static int arm_spe__synth_branch_sample(struct arm_spe_queue *speq,
}
static int arm_spe__synth_instruction_sample(struct arm_spe_queue *speq,
- u64 spe_events_id, u64 data_src)
+ u64 spe_events_id,
+ union perf_mem_data_src data_src)
{
struct arm_spe *spe = speq->spe;
struct arm_spe_record *record = &speq->decoder->record;
@@ -542,7 +544,7 @@ static int arm_spe__synth_instruction_sample(struct arm_spe_queue *speq,
sample.stream_id = spe_events_id;
sample.addr = record->to_ip;
sample.phys_addr = record->phys_addr;
- sample.data_src = data_src;
+ sample.data_src = data_src.val;
sample.period = spe->instructions_sample_period;
sample.weight = record->latency;
sample.flags = speq->flags;
@@ -891,21 +893,22 @@ static bool arm_spe__synth_ds(struct arm_spe_queue *speq,
return false;
}
-static u64 arm_spe__synth_data_source(struct arm_spe_queue *speq,
- const struct arm_spe_record *record)
+static union perf_mem_data_src
+arm_spe__synth_data_source(struct arm_spe_queue *speq,
+ const struct arm_spe_record *record)
{
- union perf_mem_data_src data_src = { .mem_op = PERF_MEM_OP_NA };
+ union perf_mem_data_src data_src = {};
/* Only synthesize data source for LDST operations */
if (!is_ldst_op(record->op))
- return 0;
+ return data_src;
if (record->op & ARM_SPE_OP_LD)
data_src.mem_op = PERF_MEM_OP_LOAD;
else if (record->op & ARM_SPE_OP_ST)
data_src.mem_op = PERF_MEM_OP_STORE;
else
- return 0;
+ return data_src;
if (!arm_spe__synth_ds(speq, record, &data_src))
arm_spe__synth_memory_level(record, &data_src);
@@ -919,14 +922,14 @@ static u64 arm_spe__synth_data_source(struct arm_spe_queue *speq,
data_src.mem_dtlb |= PERF_MEM_TLB_HIT;
}
- return data_src.val;
+ return data_src;
}
static int arm_spe_sample(struct arm_spe_queue *speq)
{
const struct arm_spe_record *record = &speq->decoder->record;
struct arm_spe *spe = speq->spe;
- u64 data_src;
+ union perf_mem_data_src data_src;
int err;
arm_spe__sample_flags(speq);
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 05/14] perf arm_spe: Directly propagate raw event
2025-07-07 13:39 [PATCH v3 00/14] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
` (3 preceding siblings ...)
2025-07-07 13:39 ` [PATCH v3 04/14] perf arm_spe: Use full type for data_src Leo Yan
@ 2025-07-07 13:39 ` Leo Yan
2025-07-07 13:39 ` [PATCH v3 06/14] perf arm_spe: Decode event types for new features Leo Yan
` (8 subsequent siblings)
13 siblings, 0 replies; 21+ messages in thread
From: Leo Yan @ 2025-07-07 13:39 UTC (permalink / raw)
To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, German Gomez, Ali Saidi
Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
Leo Yan
Two sets of event bits are defined: one for generating samples and
another are raw event bits used in the backend decoder. Reduce the
redundancy by using the raw event bits directly in the frontend code.
To avoid overflow issues, change the type of the event variable from
enum to u64.
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
tools/perf/util/arm-spe-decoder/arm-spe-decoder.c | 37 +----------------------
tools/perf/util/arm-spe-decoder/arm-spe-decoder.h | 28 ++++++++---------
2 files changed, 14 insertions(+), 51 deletions(-)
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
index 688fe6d7524416420a7c18d5f8a268492ce7c3b8..96eb7cced6fd1574f5d823e4c67b9051dcf183ed 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
@@ -229,42 +229,7 @@ static int arm_spe_read_record(struct arm_spe_decoder *decoder)
}
break;
case ARM_SPE_EVENTS:
- if (payload & BIT(EV_L1D_REFILL))
- decoder->record.type |= ARM_SPE_L1D_MISS;
-
- if (payload & BIT(EV_L1D_ACCESS))
- decoder->record.type |= ARM_SPE_L1D_ACCESS;
-
- if (payload & BIT(EV_TLB_WALK))
- decoder->record.type |= ARM_SPE_TLB_MISS;
-
- if (payload & BIT(EV_TLB_ACCESS))
- decoder->record.type |= ARM_SPE_TLB_ACCESS;
-
- if (payload & BIT(EV_LLC_MISS))
- decoder->record.type |= ARM_SPE_LLC_MISS;
-
- if (payload & BIT(EV_LLC_ACCESS))
- decoder->record.type |= ARM_SPE_LLC_ACCESS;
-
- if (payload & BIT(EV_REMOTE_ACCESS))
- decoder->record.type |= ARM_SPE_REMOTE_ACCESS;
-
- if (payload & BIT(EV_MISPRED))
- decoder->record.type |= ARM_SPE_BRANCH_MISS;
-
- if (payload & BIT(EV_NOT_TAKEN))
- decoder->record.type |= ARM_SPE_BRANCH_NOT_TAKEN;
-
- if (payload & BIT(EV_TRANSACTIONAL))
- decoder->record.type |= ARM_SPE_IN_TXN;
-
- if (payload & BIT(EV_PARTIAL_PREDICATE))
- decoder->record.type |= ARM_SPE_SVE_PARTIAL_PRED;
-
- if (payload & BIT(EV_EMPTY_PREDICATE))
- decoder->record.type |= ARM_SPE_SVE_EMPTY_PRED;
-
+ decoder->record.type = payload;
break;
case ARM_SPE_DATA_SOURCE:
decoder->record.source = payload;
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
index 881d9f29c1380b62486f0cd81498750ba06c4b50..03da55453da8fd2e7b9e2dcba3ddcf5243599e1c 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
@@ -13,20 +13,18 @@
#include "arm-spe-pkt-decoder.h"
-enum arm_spe_sample_type {
- ARM_SPE_L1D_ACCESS = 1 << 0,
- ARM_SPE_L1D_MISS = 1 << 1,
- ARM_SPE_LLC_ACCESS = 1 << 2,
- ARM_SPE_LLC_MISS = 1 << 3,
- ARM_SPE_TLB_ACCESS = 1 << 4,
- ARM_SPE_TLB_MISS = 1 << 5,
- ARM_SPE_BRANCH_MISS = 1 << 6,
- ARM_SPE_REMOTE_ACCESS = 1 << 7,
- ARM_SPE_SVE_PARTIAL_PRED = 1 << 8,
- ARM_SPE_SVE_EMPTY_PRED = 1 << 9,
- ARM_SPE_BRANCH_NOT_TAKEN = 1 << 10,
- ARM_SPE_IN_TXN = 1 << 11,
-};
+#define ARM_SPE_L1D_ACCESS BIT(EV_L1D_ACCESS)
+#define ARM_SPE_L1D_MISS BIT(EV_L1D_REFILL)
+#define ARM_SPE_LLC_ACCESS BIT(EV_LLC_ACCESS)
+#define ARM_SPE_LLC_MISS BIT(EV_LLC_MISS)
+#define ARM_SPE_TLB_ACCESS BIT(EV_TLB_ACCESS)
+#define ARM_SPE_TLB_MISS BIT(EV_TLB_WALK)
+#define ARM_SPE_BRANCH_MISS BIT(EV_MISPRED)
+#define ARM_SPE_BRANCH_NOT_TAKEN BIT(EV_NOT_TAKEN)
+#define ARM_SPE_REMOTE_ACCESS BIT(EV_REMOTE_ACCESS)
+#define ARM_SPE_SVE_PARTIAL_PRED BIT(EV_PARTIAL_PREDICATE)
+#define ARM_SPE_SVE_EMPTY_PRED BIT(EV_EMPTY_PREDICATE)
+#define ARM_SPE_IN_TXN BIT(EV_TRANSACTIONAL)
enum arm_spe_op_type {
/* First level operation type */
@@ -100,7 +98,7 @@ enum arm_spe_hisi_hip_data_source {
};
struct arm_spe_record {
- enum arm_spe_sample_type type;
+ u64 type;
int err;
u32 op;
u32 latency;
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 06/14] perf arm_spe: Decode event types for new features
2025-07-07 13:39 [PATCH v3 00/14] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
` (4 preceding siblings ...)
2025-07-07 13:39 ` [PATCH v3 05/14] perf arm_spe: Directly propagate raw event Leo Yan
@ 2025-07-07 13:39 ` Leo Yan
2025-07-07 13:39 ` [PATCH v3 07/14] perf arm_spe: Add "event_filter" entry in meta data Leo Yan
` (7 subsequent siblings)
13 siblings, 0 replies; 21+ messages in thread
From: Leo Yan @ 2025-07-07 13:39 UTC (permalink / raw)
To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, German Gomez, Ali Saidi
Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
Leo Yan
Decode new event types introduced by FEAT_SPEv1p4, FEAT_SPE_SME and
FEAT_SPE_SME.
The printed event names don't strictly follow the naming in the Arm ARM.
For example, the "Cache data modified" event is shown as "HITM", and the
"Data snooped" event is printed as "SNOOPED". Shorter names are easier
to read while preserving core meanings.
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c | 14 ++++++++++++++
tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h | 7 +++++++
2 files changed, 21 insertions(+)
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
index 13cadb2f1ceac7a90e359c4d6aa1d5fc5169e142..80561630253dd5c46f7e99b24fc13b99f346459f 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
@@ -314,6 +314,20 @@ static int arm_spe_pkt_desc_event(const struct arm_spe_pkt *packet,
arm_spe_pkt_out_string(&err, &buf, &buf_len, " SVE-PARTIAL-PRED");
if (payload & BIT(EV_EMPTY_PREDICATE))
arm_spe_pkt_out_string(&err, &buf, &buf_len, " SVE-EMPTY-PRED");
+ if (payload & BIT(EV_L2D_ACCESS))
+ arm_spe_pkt_out_string(&err, &buf, &buf_len, " L2D-ACCESS");
+ if (payload & BIT(EV_L2D_MISS))
+ arm_spe_pkt_out_string(&err, &buf, &buf_len, " L2D-MISS");
+ if (payload & BIT(EV_CACHE_DATA_MODIFIED))
+ arm_spe_pkt_out_string(&err, &buf, &buf_len, " HITM");
+ if (payload & BIT(EV_RECENTLY_FETCHED))
+ arm_spe_pkt_out_string(&err, &buf, &buf_len, " LFB");
+ if (payload & BIT(EV_DATA_SNOOPED))
+ arm_spe_pkt_out_string(&err, &buf, &buf_len, " SNOOPED");
+ if (payload & BIT(EV_STREAMING_SVE_MODE))
+ arm_spe_pkt_out_string(&err, &buf, &buf_len, " STREAMING-SVE");
+ if (payload & BIT(EV_SMCU))
+ arm_spe_pkt_out_string(&err, &buf, &buf_len, " SMCU");
return err;
}
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
index 2cdf9f6da2681244291445d54c5b13fe8a2e9d9a..d00c2481712dcc457eab2f5e9848ffc3150e6236 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
@@ -108,6 +108,13 @@ enum arm_spe_events {
EV_TRANSACTIONAL = 16,
EV_PARTIAL_PREDICATE = 17,
EV_EMPTY_PREDICATE = 18,
+ EV_L2D_ACCESS = 19,
+ EV_L2D_MISS = 20,
+ EV_CACHE_DATA_MODIFIED = 21,
+ EV_RECENTLY_FETCHED = 22,
+ EV_DATA_SNOOPED = 23,
+ EV_STREAMING_SVE_MODE = 24,
+ EV_SMCU = 25,
};
/* Operation packet header */
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 07/14] perf arm_spe: Add "event_filter" entry in meta data
2025-07-07 13:39 [PATCH v3 00/14] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
` (5 preceding siblings ...)
2025-07-07 13:39 ` [PATCH v3 06/14] perf arm_spe: Decode event types for new features Leo Yan
@ 2025-07-07 13:39 ` Leo Yan
2025-07-07 13:39 ` [PATCH v3 08/14] perf arm_spe: Refine memory level filling Leo Yan
` (6 subsequent siblings)
13 siblings, 0 replies; 21+ messages in thread
From: Leo Yan @ 2025-07-07 13:39 UTC (permalink / raw)
To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, German Gomez, Ali Saidi
Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
Leo Yan
Add a new "event_filter" entry in the meta data and dump it in raw data
mode.
After:
# perf script -D
...
0 0 0x470 [0x1f0]: PERF_RECORD_AUXTRACE_INFO type: 4
Header version :2
Header size :4
PMU type v2 :11
CPU number :8
Magic :0x1010101010101010
CPU # :0
Num of params :4
MIDR :0x410fd0f0
PMU Type :11
Min Interval :256
Event Filter :0xffff000003fefffe
...
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
tools/perf/arch/arm64/util/arm-spe.c | 5 +++++
tools/perf/util/arm-spe.c | 1 +
tools/perf/util/arm-spe.h | 2 ++
3 files changed, 8 insertions(+)
diff --git a/tools/perf/arch/arm64/util/arm-spe.c b/tools/perf/arch/arm64/util/arm-spe.c
index 4f2833b62ff55f3fd1dff3f032d6e06528460939..258b5d7b57b1d100bf334ff468c13fb4be9ed78f 100644
--- a/tools/perf/arch/arm64/util/arm-spe.c
+++ b/tools/perf/arch/arm64/util/arm-spe.c
@@ -121,12 +121,17 @@ static int arm_spe_save_cpu_header(struct auxtrace_record *itr,
/* No Arm SPE PMU is found */
data[ARM_SPE_CPU_PMU_TYPE] = ULLONG_MAX;
data[ARM_SPE_CAP_MIN_IVAL] = 0;
+ data[ARM_SPE_CAP_EVENT_FILTER] = 0;
} else {
data[ARM_SPE_CPU_PMU_TYPE] = pmu->type;
if (perf_pmu__scan_file(pmu, "caps/min_interval", "%lu", &val) != 1)
val = 0;
data[ARM_SPE_CAP_MIN_IVAL] = val;
+
+ if (perf_pmu__scan_file(pmu, "caps/event_filter", "%lu", &val) != 1)
+ val = 0;
+ data[ARM_SPE_CAP_EVENT_FILTER] = val;
}
free(cpuid);
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index fec11322690ec156500335c86812f55b1bcfb4bb..ec6a81a497c54d2418b885008685f0f74e00500e 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -1535,6 +1535,7 @@ static const char * const metadata_per_cpu_fmts[] = {
[ARM_SPE_CPU_MIDR] = " MIDR :0x%"PRIx64"\n",
[ARM_SPE_CPU_PMU_TYPE] = " PMU Type :%"PRId64"\n",
[ARM_SPE_CAP_MIN_IVAL] = " Min Interval :%"PRId64"\n",
+ [ARM_SPE_CAP_EVENT_FILTER] = " Event Filter :0x%"PRIx64"\n",
};
static void arm_spe_print_info(struct arm_spe *spe, __u64 *arr)
diff --git a/tools/perf/util/arm-spe.h b/tools/perf/util/arm-spe.h
index 390679a4af2fb61419bc881b5dc43c01f1dd77d7..3966df1856d8234bb5fe580c3f128c4d238c6221 100644
--- a/tools/perf/util/arm-spe.h
+++ b/tools/perf/util/arm-spe.h
@@ -47,6 +47,8 @@ enum {
ARM_SPE_CPU_PMU_TYPE,
/* Minimal interval */
ARM_SPE_CAP_MIN_IVAL,
+ /* Event filter */
+ ARM_SPE_CAP_EVENT_FILTER,
ARM_SPE_CPU_PRIV_MAX,
};
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 08/14] perf arm_spe: Refine memory level filling
2025-07-07 13:39 [PATCH v3 00/14] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
` (6 preceding siblings ...)
2025-07-07 13:39 ` [PATCH v3 07/14] perf arm_spe: Add "event_filter" entry in meta data Leo Yan
@ 2025-07-07 13:39 ` Leo Yan
2025-07-07 13:39 ` [PATCH v3 09/14] perf arm_spe: Separate setting of memory levels for loads and stores Leo Yan
` (5 subsequent siblings)
13 siblings, 0 replies; 21+ messages in thread
From: Leo Yan @ 2025-07-07 13:39 UTC (permalink / raw)
To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, German Gomez, Ali Saidi
Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
Leo Yan
This commit introduces macros for detecting cache level and cache miss.
Populates the 'mem_lvl_num' field which is a later added attribute for
representing memory level. Set NA ("not available") to memory levels if
memory hierarchy info is absent.
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
tools/perf/util/arm-spe.c | 32 +++++++++++++++++++++-----------
1 file changed, 21 insertions(+), 11 deletions(-)
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index ec6a81a497c54d2418b885008685f0f74e00500e..eddd554c49bcd873f6095bb262090786b1db5355 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -39,6 +39,15 @@
#define is_ldst_op(op) (!!((op) & ARM_SPE_OP_LDST))
+#define ARM_SPE_CACHE_EVENT(lvl) \
+ (ARM_SPE_##lvl##_ACCESS | ARM_SPE_##lvl##_MISS)
+
+#define arm_spe_is_cache_level(type, lvl) \
+ ((type) & ARM_SPE_CACHE_EVENT(lvl))
+
+#define arm_spe_is_cache_miss(type, lvl) \
+ ((type) & ARM_SPE_##lvl##_MISS)
+
struct arm_spe {
struct auxtrace auxtrace;
struct auxtrace_queues queues;
@@ -824,20 +833,21 @@ static const struct data_source_handle data_source_handles[] = {
static void arm_spe__synth_memory_level(const struct arm_spe_record *record,
union perf_mem_data_src *data_src)
{
- if (record->type & (ARM_SPE_LLC_ACCESS | ARM_SPE_LLC_MISS)) {
+ if (arm_spe_is_cache_level(record->type, LLC)) {
data_src->mem_lvl = PERF_MEM_LVL_L3;
-
- if (record->type & ARM_SPE_LLC_MISS)
- data_src->mem_lvl |= PERF_MEM_LVL_MISS;
- else
- data_src->mem_lvl |= PERF_MEM_LVL_HIT;
- } else if (record->type & (ARM_SPE_L1D_ACCESS | ARM_SPE_L1D_MISS)) {
+ data_src->mem_lvl |= arm_spe_is_cache_miss(record->type, LLC) ?
+ PERF_MEM_LVL_MISS : PERF_MEM_LVL_HIT;
+ data_src->mem_lvl_num = PERF_MEM_LVLNUM_L3;
+ } else if (arm_spe_is_cache_level(record->type, L1D)) {
data_src->mem_lvl = PERF_MEM_LVL_L1;
+ data_src->mem_lvl |= arm_spe_is_cache_miss(record->type, L1D) ?
+ PERF_MEM_LVL_MISS : PERF_MEM_LVL_HIT;
+ data_src->mem_lvl_num = PERF_MEM_LVLNUM_L1;
+ }
- if (record->type & ARM_SPE_L1D_MISS)
- data_src->mem_lvl |= PERF_MEM_LVL_MISS;
- else
- data_src->mem_lvl |= PERF_MEM_LVL_HIT;
+ if (!data_src->mem_lvl) {
+ data_src->mem_lvl = PERF_MEM_LVL_NA;
+ data_src->mem_lvl_num = PERF_MEM_LVLNUM_NA;
}
if (record->type & ARM_SPE_REMOTE_ACCESS)
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 09/14] perf arm_spe: Separate setting of memory levels for loads and stores
2025-07-07 13:39 [PATCH v3 00/14] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
` (7 preceding siblings ...)
2025-07-07 13:39 ` [PATCH v3 08/14] perf arm_spe: Refine memory level filling Leo Yan
@ 2025-07-07 13:39 ` Leo Yan
2025-07-07 13:39 ` [PATCH v3 10/14] perf arm_spe: Fill memory levels for FEAT_SPEv1p4 Leo Yan
` (4 subsequent siblings)
13 siblings, 0 replies; 21+ messages in thread
From: Leo Yan @ 2025-07-07 13:39 UTC (permalink / raw)
To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, German Gomez, Ali Saidi
Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
Leo Yan
For a load hit, the lowest-level cache reflects the latency of fetching
a data. Otherwise, the highest-level cache involved in refilling
indicates the overhead caused by a load.
Store operations remain unchanged to keep the descending order when
iterating through cache levels.
Split into two functions: one is for setting memory levels for loads and
another for stores.
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
tools/perf/util/arm-spe.c | 45 +++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 43 insertions(+), 2 deletions(-)
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index eddd554c49bcd873f6095bb262090786b1db5355..688f6cd0f739e2b5f23a7776a7f2ebc97c12a2dd 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -45,6 +45,9 @@
#define arm_spe_is_cache_level(type, lvl) \
((type) & ARM_SPE_CACHE_EVENT(lvl))
+#define arm_spe_is_cache_hit(type, lvl) \
+ (((type) & ARM_SPE_CACHE_EVENT(lvl)) == ARM_SPE_##lvl##_ACCESS)
+
#define arm_spe_is_cache_miss(type, lvl) \
((type) & ARM_SPE_##lvl##_MISS)
@@ -830,9 +833,38 @@ static const struct data_source_handle data_source_handles[] = {
DS(hisi_hip_ds_encoding_cpus, data_source_hisi_hip),
};
-static void arm_spe__synth_memory_level(const struct arm_spe_record *record,
- union perf_mem_data_src *data_src)
+static void arm_spe__synth_ld_memory_level(const struct arm_spe_record *record,
+ union perf_mem_data_src *data_src)
+{
+ /*
+ * To find a cache hit, search in ascending order from the lower level
+ * caches to the higher level caches. This reflects the best scenario
+ * for a cache hit.
+ */
+ if (arm_spe_is_cache_hit(record->type, L1D)) {
+ data_src->mem_lvl = PERF_MEM_LVL_L1 | PERF_MEM_LVL_HIT;
+ data_src->mem_lvl_num = PERF_MEM_LVLNUM_L1;
+ } else if (arm_spe_is_cache_hit(record->type, LLC)) {
+ data_src->mem_lvl = PERF_MEM_LVL_L3 | PERF_MEM_LVL_HIT;
+ data_src->mem_lvl_num = PERF_MEM_LVLNUM_L3;
+ /*
+ * To find a cache miss, search in descending order from the higher
+ * level cache to the lower level cache. This represents the worst
+ * scenario for a cache miss.
+ */
+ } else if (arm_spe_is_cache_miss(record->type, LLC)) {
+ data_src->mem_lvl = PERF_MEM_LVL_L3 | PERF_MEM_LVL_MISS;
+ data_src->mem_lvl_num = PERF_MEM_LVLNUM_L3;
+ } else if (arm_spe_is_cache_miss(record->type, L1D)) {
+ data_src->mem_lvl = PERF_MEM_LVL_L1 | PERF_MEM_LVL_MISS;
+ data_src->mem_lvl_num = PERF_MEM_LVLNUM_L1;
+ }
+}
+
+static void arm_spe__synth_st_memory_level(const struct arm_spe_record *record,
+ union perf_mem_data_src *data_src)
{
+ /* Record the greatest level info for a store operation. */
if (arm_spe_is_cache_level(record->type, LLC)) {
data_src->mem_lvl = PERF_MEM_LVL_L3;
data_src->mem_lvl |= arm_spe_is_cache_miss(record->type, LLC) ?
@@ -844,6 +876,15 @@ static void arm_spe__synth_memory_level(const struct arm_spe_record *record,
PERF_MEM_LVL_MISS : PERF_MEM_LVL_HIT;
data_src->mem_lvl_num = PERF_MEM_LVLNUM_L1;
}
+}
+
+static void arm_spe__synth_memory_level(const struct arm_spe_record *record,
+ union perf_mem_data_src *data_src)
+{
+ if (data_src->mem_op == PERF_MEM_OP_LOAD)
+ arm_spe__synth_ld_memory_level(record, data_src);
+ if (data_src->mem_op == PERF_MEM_OP_STORE)
+ arm_spe__synth_st_memory_level(record, data_src);
if (!data_src->mem_lvl) {
data_src->mem_lvl = PERF_MEM_LVL_NA;
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 10/14] perf arm_spe: Fill memory levels for FEAT_SPEv1p4
2025-07-07 13:39 [PATCH v3 00/14] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
` (8 preceding siblings ...)
2025-07-07 13:39 ` [PATCH v3 09/14] perf arm_spe: Separate setting of memory levels for loads and stores Leo Yan
@ 2025-07-07 13:39 ` Leo Yan
2025-07-07 13:39 ` [PATCH v3 11/14] perf arm_spe: Improve CPU number retrieving in per-thread mode Leo Yan
` (3 subsequent siblings)
13 siblings, 0 replies; 21+ messages in thread
From: Leo Yan @ 2025-07-07 13:39 UTC (permalink / raw)
To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, German Gomez, Ali Saidi
Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
Leo Yan
Starting with FEAT_SPEv1p4, Arm SPE provides information on Level 2 data
cache and recently fetched events. This patch fills in the memory levels
for these new events.
The recently fetched events are matched to line-fill buffer (LFB). In
general, the latency for accessing LFB is higher than accessing L1 cache
but lower than accessing L2 cache. Thus, it locates in the memory
hierarchy information between L1 cache and L2 cache.
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
tools/perf/util/arm-spe-decoder/arm-spe-decoder.h | 3 +++
tools/perf/util/arm-spe.c | 14 ++++++++++++++
2 files changed, 17 insertions(+)
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
index 03da55453da8fd2e7b9e2dcba3ddcf5243599e1c..3afa8703b21db9d231eef93fe981e0c20d562e83 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
@@ -25,6 +25,9 @@
#define ARM_SPE_SVE_PARTIAL_PRED BIT(EV_PARTIAL_PREDICATE)
#define ARM_SPE_SVE_EMPTY_PRED BIT(EV_EMPTY_PREDICATE)
#define ARM_SPE_IN_TXN BIT(EV_TRANSACTIONAL)
+#define ARM_SPE_L2D_ACCESS BIT(EV_L2D_ACCESS)
+#define ARM_SPE_L2D_MISS BIT(EV_L2D_MISS)
+#define ARM_SPE_RECENTLY_FETCHED BIT(EV_RECENTLY_FETCHED)
enum arm_spe_op_type {
/* First level operation type */
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 688f6cd0f739e2b5f23a7776a7f2ebc97c12a2dd..3715afbe1e4713b5eebb00afbcb3eaa56ff1c49c 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -844,6 +844,12 @@ static void arm_spe__synth_ld_memory_level(const struct arm_spe_record *record,
if (arm_spe_is_cache_hit(record->type, L1D)) {
data_src->mem_lvl = PERF_MEM_LVL_L1 | PERF_MEM_LVL_HIT;
data_src->mem_lvl_num = PERF_MEM_LVLNUM_L1;
+ } else if (record->type & ARM_SPE_RECENTLY_FETCHED) {
+ data_src->mem_lvl = PERF_MEM_LVL_LFB | PERF_MEM_LVL_HIT;
+ data_src->mem_lvl_num = PERF_MEM_LVLNUM_LFB;
+ } else if (arm_spe_is_cache_hit(record->type, L2D)) {
+ data_src->mem_lvl = PERF_MEM_LVL_L2 | PERF_MEM_LVL_HIT;
+ data_src->mem_lvl_num = PERF_MEM_LVLNUM_L2;
} else if (arm_spe_is_cache_hit(record->type, LLC)) {
data_src->mem_lvl = PERF_MEM_LVL_L3 | PERF_MEM_LVL_HIT;
data_src->mem_lvl_num = PERF_MEM_LVLNUM_L3;
@@ -855,6 +861,9 @@ static void arm_spe__synth_ld_memory_level(const struct arm_spe_record *record,
} else if (arm_spe_is_cache_miss(record->type, LLC)) {
data_src->mem_lvl = PERF_MEM_LVL_L3 | PERF_MEM_LVL_MISS;
data_src->mem_lvl_num = PERF_MEM_LVLNUM_L3;
+ } else if (arm_spe_is_cache_miss(record->type, L2D)) {
+ data_src->mem_lvl = PERF_MEM_LVL_L2 | PERF_MEM_LVL_MISS;
+ data_src->mem_lvl_num = PERF_MEM_LVLNUM_L2;
} else if (arm_spe_is_cache_miss(record->type, L1D)) {
data_src->mem_lvl = PERF_MEM_LVL_L1 | PERF_MEM_LVL_MISS;
data_src->mem_lvl_num = PERF_MEM_LVLNUM_L1;
@@ -870,6 +879,11 @@ static void arm_spe__synth_st_memory_level(const struct arm_spe_record *record,
data_src->mem_lvl |= arm_spe_is_cache_miss(record->type, LLC) ?
PERF_MEM_LVL_MISS : PERF_MEM_LVL_HIT;
data_src->mem_lvl_num = PERF_MEM_LVLNUM_L3;
+ } else if (arm_spe_is_cache_level(record->type, L2D)) {
+ data_src->mem_lvl = PERF_MEM_LVL_L2;
+ data_src->mem_lvl |= arm_spe_is_cache_miss(record->type, L2D) ?
+ PERF_MEM_LVL_MISS : PERF_MEM_LVL_HIT;
+ data_src->mem_lvl_num = PERF_MEM_LVLNUM_L2;
} else if (arm_spe_is_cache_level(record->type, L1D)) {
data_src->mem_lvl = PERF_MEM_LVL_L1;
data_src->mem_lvl |= arm_spe_is_cache_miss(record->type, L1D) ?
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 11/14] perf arm_spe: Improve CPU number retrieving in per-thread mode
2025-07-07 13:39 [PATCH v3 00/14] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
` (9 preceding siblings ...)
2025-07-07 13:39 ` [PATCH v3 10/14] perf arm_spe: Fill memory levels for FEAT_SPEv1p4 Leo Yan
@ 2025-07-07 13:39 ` Leo Yan
2025-07-07 13:39 ` [PATCH v3 12/14] perf arm_spe: Refactor arm_spe__get_metadata_by_cpu() Leo Yan
` (2 subsequent siblings)
13 siblings, 0 replies; 21+ messages in thread
From: Leo Yan @ 2025-07-07 13:39 UTC (permalink / raw)
To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, German Gomez, Ali Saidi
Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
Leo Yan
In per-thread mode on a homogeneous system, the current code simply
picks the first metadata entry for data source parsing.
This change improves that by using the PMU type to find the matching PMU
event. From there, it reads the CPU map and uses the first CPU ID to
fetch the metadata.
Although this makes no difference when there's only one Arm SPE PMU, it
helps for future support of multiple SPE events.
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
tools/perf/util/arm-spe.c | 19 +++++++++++++++----
1 file changed, 15 insertions(+), 4 deletions(-)
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 3715afbe1e4713b5eebb00afbcb3eaa56ff1c49c..87cbba941d6f3066697ff430d6ce8e085b173032 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -914,6 +914,9 @@ static bool arm_spe__synth_ds(struct arm_spe_queue *speq,
union perf_mem_data_src *data_src)
{
struct arm_spe *spe = speq->spe;
+ struct perf_cpu_map *cpus;
+ struct perf_cpu perf_cpu;
+ int16_t cpu_nr;
u64 *metadata = NULL;
u64 midr;
unsigned int i;
@@ -935,13 +938,21 @@ static bool arm_spe__synth_ds(struct arm_spe_queue *speq,
if (!spe->is_homogeneous)
return false;
- /* In homogeneous system, simply use CPU0's metadata */
- if (spe->metadata)
- metadata = spe->metadata[0];
+ cpus = perf_pmus__find_by_type(spe->pmu_type)->cpus;
+ if (!cpus)
+ return false;
+
+ /* In a homogeneous system, fetch the first CPU in the map. */
+ perf_cpu = perf_cpu_map__cpu(cpus, 0);
+ if (perf_cpu.cpu == -1)
+ return false;
+
+ cpu_nr = perf_cpu.cpu;
} else {
- metadata = arm_spe__get_metadata_by_cpu(spe, speq->cpu);
+ cpu_nr = speq->cpu;
}
+ metadata = arm_spe__get_metadata_by_cpu(spe, cpu_nr);
if (!metadata)
return false;
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 12/14] perf arm_spe: Refactor arm_spe__get_metadata_by_cpu()
2025-07-07 13:39 [PATCH v3 00/14] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
` (10 preceding siblings ...)
2025-07-07 13:39 ` [PATCH v3 11/14] perf arm_spe: Improve CPU number retrieving in per-thread mode Leo Yan
@ 2025-07-07 13:39 ` Leo Yan
2025-07-07 13:39 ` [PATCH v3 13/14] perf arm_spe: Set HITM flag Leo Yan
2025-07-07 13:39 ` [PATCH v3 14/14] perf arm_spe: Allow parsing both data source and events Leo Yan
13 siblings, 0 replies; 21+ messages in thread
From: Leo Yan @ 2025-07-07 13:39 UTC (permalink / raw)
To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, German Gomez, Ali Saidi
Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
Leo Yan
Handle "CPU=-1" (per-thread mode) in the arm_spe__get_metadata_by_cpu()
function. As a result, the function is more general and will be invoked
by a sequential change.
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
tools/perf/util/arm-spe.c | 55 ++++++++++++++++++++++-------------------------
1 file changed, 26 insertions(+), 29 deletions(-)
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 87cbba941d6f3066697ff430d6ce8e085b173032..e89182b6d27d3d00357db72d804f1d22d5765937 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -317,15 +317,38 @@ static int arm_spe_set_tid(struct arm_spe_queue *speq, pid_t tid)
return 0;
}
-static u64 *arm_spe__get_metadata_by_cpu(struct arm_spe *spe, u64 cpu)
+static u64 *arm_spe__get_metadata_by_cpu(struct arm_spe *spe, int cpu)
{
+ struct perf_cpu_map *cpus;
+ struct perf_cpu perf_cpu;
u64 i;
if (!spe->metadata)
return NULL;
+ /* CPU ID is -1 for per-thread mode */
+ if (cpu < 0) {
+ /*
+ * On the heterogeneous system, due to CPU ID is -1,
+ * cannot confirm the data source packet is supported.
+ */
+ if (!spe->is_homogeneous)
+ return NULL;
+
+ cpus = perf_pmus__find_by_type(spe->pmu_type)->cpus;
+ if (!cpus)
+ return NULL;
+
+ /* In a homogeneous system, fetch the first CPU in the map. */
+ perf_cpu = perf_cpu_map__cpu(cpus, 0);
+ if (perf_cpu.cpu == -1)
+ return NULL;
+
+ cpu = perf_cpu.cpu;
+ }
+
for (i = 0; i < spe->metadata_nr_cpu; i++)
- if (spe->metadata[i][ARM_SPE_CPU] == cpu)
+ if (spe->metadata[i][ARM_SPE_CPU] == (u64)cpu)
return spe->metadata[i];
return NULL;
@@ -914,9 +937,6 @@ static bool arm_spe__synth_ds(struct arm_spe_queue *speq,
union perf_mem_data_src *data_src)
{
struct arm_spe *spe = speq->spe;
- struct perf_cpu_map *cpus;
- struct perf_cpu perf_cpu;
- int16_t cpu_nr;
u64 *metadata = NULL;
u64 midr;
unsigned int i;
@@ -929,30 +949,7 @@ static bool arm_spe__synth_ds(struct arm_spe_queue *speq,
cpuid = perf_env__cpuid(spe->session->evlist->env);
midr = strtol(cpuid, NULL, 16);
} else {
- /* CPU ID is -1 for per-thread mode */
- if (speq->cpu < 0) {
- /*
- * On the heterogeneous system, due to CPU ID is -1,
- * cannot confirm the data source packet is supported.
- */
- if (!spe->is_homogeneous)
- return false;
-
- cpus = perf_pmus__find_by_type(spe->pmu_type)->cpus;
- if (!cpus)
- return false;
-
- /* In a homogeneous system, fetch the first CPU in the map. */
- perf_cpu = perf_cpu_map__cpu(cpus, 0);
- if (perf_cpu.cpu == -1)
- return false;
-
- cpu_nr = perf_cpu.cpu;
- } else {
- cpu_nr = speq->cpu;
- }
-
- metadata = arm_spe__get_metadata_by_cpu(spe, cpu_nr);
+ metadata = arm_spe__get_metadata_by_cpu(spe, speq->cpu);
if (!metadata)
return false;
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 13/14] perf arm_spe: Set HITM flag
2025-07-07 13:39 [PATCH v3 00/14] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
` (11 preceding siblings ...)
2025-07-07 13:39 ` [PATCH v3 12/14] perf arm_spe: Refactor arm_spe__get_metadata_by_cpu() Leo Yan
@ 2025-07-07 13:39 ` Leo Yan
2025-07-07 13:39 ` [PATCH v3 14/14] perf arm_spe: Allow parsing both data source and events Leo Yan
13 siblings, 0 replies; 21+ messages in thread
From: Leo Yan @ 2025-07-07 13:39 UTC (permalink / raw)
To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, German Gomez, Ali Saidi
Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
Leo Yan
Since FEAT_SPEv1p4, Arm SPE provides extra two events "Cache data
modified" and "Data snooped".
Set the snoop mode as:
- If both the "Cache data modified" event and the "Data snooped" event
are set, which indicates a load operation that snooped from a outside
cache and hit a modified copy, set the HITM flag to inspect false
sharing.
- If the snooped event bit is not set, and the snooped event has been
supported by the hardware, set as NONE mode (no snoop operation).
- If the snooped event bit is not set, and the event is not supported or
absent the events info in the meta data, set as NA mode (not
available).
Don't set any mode for only "Cache data modified" event, as it hits a
local modified copy.
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
tools/perf/util/arm-spe-decoder/arm-spe-decoder.h | 2 ++
tools/perf/util/arm-spe.c | 26 +++++++++++++++++++++--
2 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
index 3afa8703b21db9d231eef93fe981e0c20d562e83..fbb57f8052371e51d562d9dd6098e97fc099461c 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
@@ -28,6 +28,8 @@
#define ARM_SPE_L2D_ACCESS BIT(EV_L2D_ACCESS)
#define ARM_SPE_L2D_MISS BIT(EV_L2D_MISS)
#define ARM_SPE_RECENTLY_FETCHED BIT(EV_RECENTLY_FETCHED)
+#define ARM_SPE_DATA_SNOOPED BIT(EV_DATA_SNOOPED)
+#define ARM_SPE_HITM BIT(EV_CACHE_DATA_MODIFIED)
enum arm_spe_op_type {
/* First level operation type */
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index e89182b6d27d3d00357db72d804f1d22d5765937..082a1c69b42047b6fdf263ab2c74cc5fa9accd13 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -915,9 +915,12 @@ static void arm_spe__synth_st_memory_level(const struct arm_spe_record *record,
}
}
-static void arm_spe__synth_memory_level(const struct arm_spe_record *record,
+static void arm_spe__synth_memory_level(struct arm_spe_queue *speq,
+ const struct arm_spe_record *record,
union perf_mem_data_src *data_src)
{
+ struct arm_spe *spe = speq->spe;
+
if (data_src->mem_op == PERF_MEM_OP_LOAD)
arm_spe__synth_ld_memory_level(record, data_src);
if (data_src->mem_op == PERF_MEM_OP_STORE)
@@ -928,6 +931,25 @@ static void arm_spe__synth_memory_level(const struct arm_spe_record *record,
data_src->mem_lvl_num = PERF_MEM_LVLNUM_NA;
}
+ if (record->type & ARM_SPE_DATA_SNOOPED) {
+ if (record->type & ARM_SPE_HITM)
+ data_src->mem_snoop = PERF_MEM_SNOOP_HITM;
+ else
+ data_src->mem_snoop = PERF_MEM_SNOOP_HIT;
+ } else {
+ u64 *metadata = arm_spe__get_metadata_by_cpu(spe, speq->cpu);
+
+ /*
+ * Set NA ("Not available") mode if no meta data or the
+ * SNOOPED event is not supported.
+ */
+ if (!metadata ||
+ !(metadata[ARM_SPE_CAP_EVENT_FILTER] & ARM_SPE_DATA_SNOOPED))
+ data_src->mem_snoop = PERF_MEM_SNOOP_NA;
+ else
+ data_src->mem_snoop = PERF_MEM_SNOOP_NONE;
+ }
+
if (record->type & ARM_SPE_REMOTE_ACCESS)
data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
}
@@ -984,7 +1006,7 @@ arm_spe__synth_data_source(struct arm_spe_queue *speq,
return data_src;
if (!arm_spe__synth_ds(speq, record, &data_src))
- arm_spe__synth_memory_level(record, &data_src);
+ arm_spe__synth_memory_level(speq, record, &data_src);
if (record->type & (ARM_SPE_TLB_ACCESS | ARM_SPE_TLB_MISS)) {
data_src.mem_dtlb = PERF_MEM_TLB_WK;
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 14/14] perf arm_spe: Allow parsing both data source and events
2025-07-07 13:39 [PATCH v3 00/14] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
` (12 preceding siblings ...)
2025-07-07 13:39 ` [PATCH v3 13/14] perf arm_spe: Set HITM flag Leo Yan
@ 2025-07-07 13:39 ` Leo Yan
13 siblings, 0 replies; 21+ messages in thread
From: Leo Yan @ 2025-07-07 13:39 UTC (permalink / raw)
To: Will Deacon, Mark Rutland, James Clark, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, German Gomez, Ali Saidi
Cc: Arnaldo Carvalho de Melo, linux-arm-kernel, linux-perf-users,
Leo Yan
Current code skips to parse events after generating data source. The
reason is the data source packets have cache and snooping related info,
the afterwards event packets might contain duplicate info.
This commit changes to continue parsing the events after data source
analysis. If data source does not give out memory level and snooping
types, then the event info is used to synthesize the related fields.
As a result, both the peer snoop option ('-d peer') and hitm options
('-d tot/lcl/rmt') are supported by Arm SPE in the perf c2c.
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
tools/perf/util/arm-spe.c | 75 ++++++++++++++++++++++++++++-------------------
1 file changed, 45 insertions(+), 30 deletions(-)
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 082a1c69b42047b6fdf263ab2c74cc5fa9accd13..e2d9035fcd28a4c2dd7a996658a5504c856bfca6 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -921,40 +921,56 @@ static void arm_spe__synth_memory_level(struct arm_spe_queue *speq,
{
struct arm_spe *spe = speq->spe;
- if (data_src->mem_op == PERF_MEM_OP_LOAD)
- arm_spe__synth_ld_memory_level(record, data_src);
- if (data_src->mem_op == PERF_MEM_OP_STORE)
- arm_spe__synth_st_memory_level(record, data_src);
+ /*
+ * The data source packet contains more info for cache levels for
+ * peer snooping. So respect the memory level if has been set by
+ * data source parsing.
+ */
+ if (!data_src->mem_lvl) {
+ if (data_src->mem_op == PERF_MEM_OP_LOAD)
+ arm_spe__synth_ld_memory_level(record, data_src);
+ if (data_src->mem_op == PERF_MEM_OP_STORE)
+ arm_spe__synth_st_memory_level(record, data_src);
+ }
if (!data_src->mem_lvl) {
data_src->mem_lvl = PERF_MEM_LVL_NA;
data_src->mem_lvl_num = PERF_MEM_LVLNUM_NA;
}
- if (record->type & ARM_SPE_DATA_SNOOPED) {
- if (record->type & ARM_SPE_HITM)
- data_src->mem_snoop = PERF_MEM_SNOOP_HITM;
- else
- data_src->mem_snoop = PERF_MEM_SNOOP_HIT;
- } else {
- u64 *metadata = arm_spe__get_metadata_by_cpu(spe, speq->cpu);
-
- /*
- * Set NA ("Not available") mode if no meta data or the
- * SNOOPED event is not supported.
- */
- if (!metadata ||
- !(metadata[ARM_SPE_CAP_EVENT_FILTER] & ARM_SPE_DATA_SNOOPED))
- data_src->mem_snoop = PERF_MEM_SNOOP_NA;
- else
- data_src->mem_snoop = PERF_MEM_SNOOP_NONE;
+ /*
+ * If 'mem_snoop' has been set by data source packet, skip to set
+ * it at here.
+ */
+ if (!data_src->mem_snoop) {
+ if (record->type & ARM_SPE_DATA_SNOOPED) {
+ if (record->type & ARM_SPE_HITM)
+ data_src->mem_snoop = PERF_MEM_SNOOP_HITM;
+ else
+ data_src->mem_snoop = PERF_MEM_SNOOP_HIT;
+ } else {
+ u64 *metadata =
+ arm_spe__get_metadata_by_cpu(spe, speq->cpu);
+
+ /*
+ * Set NA ("Not available") mode if no meta data or the
+ * SNOOPED event is not supported.
+ */
+ if (!metadata ||
+ !(metadata[ARM_SPE_CAP_EVENT_FILTER] & ARM_SPE_DATA_SNOOPED))
+ data_src->mem_snoop = PERF_MEM_SNOOP_NA;
+ else
+ data_src->mem_snoop = PERF_MEM_SNOOP_NONE;
+ }
}
- if (record->type & ARM_SPE_REMOTE_ACCESS)
- data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
+ if (!data_src->mem_remote) {
+ if (record->type & ARM_SPE_REMOTE_ACCESS)
+ data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
+ }
}
-static bool arm_spe__synth_ds(struct arm_spe_queue *speq,
+static void arm_spe__synth_ds(struct arm_spe_queue *speq,
const struct arm_spe_record *record,
union perf_mem_data_src *data_src)
{
@@ -973,19 +989,18 @@ static bool arm_spe__synth_ds(struct arm_spe_queue *speq,
} else {
metadata = arm_spe__get_metadata_by_cpu(spe, speq->cpu);
if (!metadata)
- return false;
+ return;
midr = metadata[ARM_SPE_CPU_MIDR];
}
for (i = 0; i < ARRAY_SIZE(data_source_handles); i++) {
if (is_midr_in_range_list(midr, data_source_handles[i].midr_ranges)) {
- data_source_handles[i].ds_synth(record, data_src);
- return true;
+ return data_source_handles[i].ds_synth(record, data_src);
}
}
- return false;
+ return;
}
static union perf_mem_data_src
@@ -1005,8 +1020,8 @@ arm_spe__synth_data_source(struct arm_spe_queue *speq,
else
return data_src;
- if (!arm_spe__synth_ds(speq, record, &data_src))
- arm_spe__synth_memory_level(speq, record, &data_src);
+ arm_spe__synth_ds(speq, record, &data_src);
+ arm_spe__synth_memory_level(speq, record, &data_src);
if (record->type & (ARM_SPE_TLB_ACCESS | ARM_SPE_TLB_MISS)) {
data_src.mem_dtlb = PERF_MEM_TLB_WK;
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v3 01/14] drivers/perf: arm_spe: Expose event filter
2025-07-07 13:39 ` [PATCH v3 01/14] drivers/perf: arm_spe: Expose event filter Leo Yan
@ 2025-07-14 13:07 ` Will Deacon
2025-07-14 15:09 ` Leo Yan
0 siblings, 1 reply; 21+ messages in thread
From: Will Deacon @ 2025-07-14 13:07 UTC (permalink / raw)
To: Leo Yan
Cc: Mark Rutland, James Clark, Arnaldo Carvalho de Melo, Namhyung Kim,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
German Gomez, Ali Saidi, Arnaldo Carvalho de Melo,
linux-arm-kernel, linux-perf-users
On Mon, Jul 07, 2025 at 02:39:23PM +0100, Leo Yan wrote:
> Expose an "event_filter" entry in the caps folder to inform user space
> about which events can be filtered.
>
> Change the return type of arm_spe_pmu_cap_get() from u32 to u64 to
> accommodate the added event filter entry.
>
> Signed-off-by: Leo Yan <leo.yan@arm.com>
> ---
> drivers/perf/arm_spe_pmu.c | 36 ++++++++++++++++++++----------------
> 1 file changed, 20 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
> index 3efed8839a4ec5604eba242cb620327cd2a6a87d..78d8cb59b66d7bc6319eb4ee40e6d2d32ffb8bdf 100644
> --- a/drivers/perf/arm_spe_pmu.c
> +++ b/drivers/perf/arm_spe_pmu.c
> @@ -115,6 +115,7 @@ enum arm_spe_pmu_capabilities {
> SPE_PMU_CAP_FEAT_MAX,
> SPE_PMU_CAP_CNT_SZ = SPE_PMU_CAP_FEAT_MAX,
> SPE_PMU_CAP_MIN_IVAL,
> + SPE_PMU_CAP_EVENT_FILTER,
> };
>
> static int arm_spe_pmu_feat_caps[SPE_PMU_CAP_FEAT_MAX] = {
> @@ -122,7 +123,21 @@ static int arm_spe_pmu_feat_caps[SPE_PMU_CAP_FEAT_MAX] = {
> [SPE_PMU_CAP_ERND] = SPE_PMU_FEAT_ERND,
> };
>
> -static u32 arm_spe_pmu_cap_get(struct arm_spe_pmu *spe_pmu, int cap)
> +static u64 arm_spe_pmsevfr_res0(u16 pmsver)
> +{
> + switch (pmsver) {
> + case ID_AA64DFR0_EL1_PMSVer_IMP:
> + return PMSEVFR_EL1_RES0_IMP;
> + case ID_AA64DFR0_EL1_PMSVer_V1P1:
> + return PMSEVFR_EL1_RES0_V1P1;
> + case ID_AA64DFR0_EL1_PMSVer_V1P2:
> + /* Return the highest version we support in default */
> + default:
> + return PMSEVFR_EL1_RES0_V1P2;
> + }
> +}
Hmm. This logic was already a little shakey and so I'm not sure it's a
good idea to expose it directly to userspace. Maintaining RES0 masks for
different versions of SPE won't scale and there are already things that
we can't sensibly handle. For example, E[8]:
| When (FEAT_SPEv1p4 is implemented or filtering on event 8 is
| optionally supported) and event 8 is implemented:
So, stepping back, can we remove this stuff altogether? The bits are
RAZ/WI in the case that the even is not implement, but that means that:
| Software can rely on the field reading as all 0s, and on writes being
| ignored.
so why are we even bothering to police this?
In other words, remove arm_spe_pmsevfr_res0() and the two checks that
use it in arm_spe_pmu_event_init(). If userspace tries to filter events
that aren't implemented, then it gets to keep the pieces.
Will
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 01/14] drivers/perf: arm_spe: Expose event filter
2025-07-14 13:07 ` Will Deacon
@ 2025-07-14 15:09 ` Leo Yan
2025-07-14 15:13 ` Will Deacon
0 siblings, 1 reply; 21+ messages in thread
From: Leo Yan @ 2025-07-14 15:09 UTC (permalink / raw)
To: Will Deacon
Cc: Mark Rutland, James Clark, Arnaldo Carvalho de Melo, Namhyung Kim,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
German Gomez, Ali Saidi, Arnaldo Carvalho de Melo,
linux-arm-kernel, linux-perf-users
Hi Will,
On Mon, Jul 14, 2025 at 02:07:32PM +0100, Will Deacon wrote:
[...]
> > +static u64 arm_spe_pmsevfr_res0(u16 pmsver)
> > +{
> > + switch (pmsver) {
> > + case ID_AA64DFR0_EL1_PMSVer_IMP:
> > + return PMSEVFR_EL1_RES0_IMP;
> > + case ID_AA64DFR0_EL1_PMSVer_V1P1:
> > + return PMSEVFR_EL1_RES0_V1P1;
> > + case ID_AA64DFR0_EL1_PMSVer_V1P2:
> > + /* Return the highest version we support in default */
> > + default:
> > + return PMSEVFR_EL1_RES0_V1P2;
> > + }
> > +}
>
> Hmm. This logic was already a little shakey and so I'm not sure it's a
> good idea to expose it directly to userspace. Maintaining RES0 masks for
> different versions of SPE won't scale and there are already things that
> we can't sensibly handle. For example, E[8]:
>
> | When (FEAT_SPEv1p4 is implemented or filtering on event 8 is
> | optionally supported) and event 8 is implemented:
>
> So, stepping back, can we remove this stuff altogether? The bits are
> RAZ/WI in the case that the even is not implement, but that means that:
>
> | Software can rely on the field reading as all 0s, and on writes being
> | ignored.
>
> so why are we even bothering to police this?
It's fine with me to remove the validation for the event filter.
However, I have the following question in comment below.
> In other words, remove arm_spe_pmsevfr_res0() and the two checks that
> use it in arm_spe_pmu_event_init(). If userspace tries to filter events
> that aren't implemented, then it gets to keep the pieces.
Then the question is: what information should be exposed to userspace
so that tools can decide which events are valid?
I would suggest to expose a new entry, "caps/version", to indicate the
SPE version number. Tools can use this to apply the appropriate event
validation. Please let me know if this works for you.
Thanks,
Leo
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 01/14] drivers/perf: arm_spe: Expose event filter
2025-07-14 15:09 ` Leo Yan
@ 2025-07-14 15:13 ` Will Deacon
2025-07-14 15:42 ` Leo Yan
0 siblings, 1 reply; 21+ messages in thread
From: Will Deacon @ 2025-07-14 15:13 UTC (permalink / raw)
To: Leo Yan
Cc: Mark Rutland, James Clark, Arnaldo Carvalho de Melo, Namhyung Kim,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
German Gomez, Ali Saidi, Arnaldo Carvalho de Melo,
linux-arm-kernel, linux-perf-users
On Mon, Jul 14, 2025 at 04:09:21PM +0100, Leo Yan wrote:
> On Mon, Jul 14, 2025 at 02:07:32PM +0100, Will Deacon wrote:
>
> [...]
>
> > > +static u64 arm_spe_pmsevfr_res0(u16 pmsver)
> > > +{
> > > + switch (pmsver) {
> > > + case ID_AA64DFR0_EL1_PMSVer_IMP:
> > > + return PMSEVFR_EL1_RES0_IMP;
> > > + case ID_AA64DFR0_EL1_PMSVer_V1P1:
> > > + return PMSEVFR_EL1_RES0_V1P1;
> > > + case ID_AA64DFR0_EL1_PMSVer_V1P2:
> > > + /* Return the highest version we support in default */
> > > + default:
> > > + return PMSEVFR_EL1_RES0_V1P2;
> > > + }
> > > +}
> >
> > Hmm. This logic was already a little shakey and so I'm not sure it's a
> > good idea to expose it directly to userspace. Maintaining RES0 masks for
> > different versions of SPE won't scale and there are already things that
> > we can't sensibly handle. For example, E[8]:
> >
> > | When (FEAT_SPEv1p4 is implemented or filtering on event 8 is
> > | optionally supported) and event 8 is implemented:
> >
> > So, stepping back, can we remove this stuff altogether? The bits are
> > RAZ/WI in the case that the even is not implement, but that means that:
> >
> > | Software can rely on the field reading as all 0s, and on writes being
> > | ignored.
> >
> > so why are we even bothering to police this?
>
> It's fine with me to remove the validation for the event filter.
>
> However, I have the following question in comment below.
>
> > In other words, remove arm_spe_pmsevfr_res0() and the two checks that
> > use it in arm_spe_pmu_event_init(). If userspace tries to filter events
> > that aren't implemented, then it gets to keep the pieces.
>
> Then the question is: what information should be exposed to userspace
> so that tools can decide which events are valid?
>
> I would suggest to expose a new entry, "caps/version", to indicate the
> SPE version number. Tools can use this to apply the appropriate event
> validation. Please let me know if this works for you.
I thought userspace typically had midr-based json files to figure this
stuff out? The supported events aren't probe-able afaict so I don't
think the driver can help.
Will
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 01/14] drivers/perf: arm_spe: Expose event filter
2025-07-14 15:13 ` Will Deacon
@ 2025-07-14 15:42 ` Leo Yan
2025-07-15 11:15 ` James Clark
0 siblings, 1 reply; 21+ messages in thread
From: Leo Yan @ 2025-07-14 15:42 UTC (permalink / raw)
To: Will Deacon
Cc: Mark Rutland, James Clark, Arnaldo Carvalho de Melo, Namhyung Kim,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
German Gomez, Ali Saidi, Arnaldo Carvalho de Melo,
linux-arm-kernel, linux-perf-users
On Mon, Jul 14, 2025 at 04:13:49PM +0100, Will Deacon wrote:
[...]
> > > In other words, remove arm_spe_pmsevfr_res0() and the two checks that
> > > use it in arm_spe_pmu_event_init(). If userspace tries to filter events
> > > that aren't implemented, then it gets to keep the pieces.
> >
> > Then the question is: what information should be exposed to userspace
> > so that tools can decide which events are valid?
> >
> > I would suggest to expose a new entry, "caps/version", to indicate the
> > SPE version number. Tools can use this to apply the appropriate event
> > validation. Please let me know if this works for you.
>
> I thought userspace typically had midr-based json files to figure this
> stuff out?
Yes, the perf tool records the CPU MIDR in the metadata.
However, I deliberately tried to avoid relying on this approach,
because the perf would then need to maintain a mapping between:
MIDR -> Arm SPE version -> Events
Given the large number of CPU variants, maintaining this relationship
between CPU ID and SPE version, and subsequently mapping it to supported
events, would be quite complex. Additional effort would be required each
time a new CPU variant is introduced.
> The supported events aren't probe-able afaict so I don't
> think the driver can help.
Although the events are not probe-able, some events are specific to
certain Arm SPE versions. For example, E[23]:
| Data snooped.
| When FEAT_SPEv1p4 is implemented
With SPE version information, the perf tool can decode E[23] == 0 as:
"No snooping" for SPEv4
"No available information for snooping" for earlier SPE versions
Without SPE version information, it's impossible to distinguish
between these two cases.
Thanks,
Leo
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 01/14] drivers/perf: arm_spe: Expose event filter
2025-07-14 15:42 ` Leo Yan
@ 2025-07-15 11:15 ` James Clark
2025-07-17 11:43 ` Will Deacon
0 siblings, 1 reply; 21+ messages in thread
From: James Clark @ 2025-07-15 11:15 UTC (permalink / raw)
To: Leo Yan, Will Deacon
Cc: Mark Rutland, Arnaldo Carvalho de Melo, Namhyung Kim,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
German Gomez, Ali Saidi, Arnaldo Carvalho de Melo,
linux-arm-kernel, linux-perf-users
On 14/07/2025 4:42 pm, Leo Yan wrote:
> On Mon, Jul 14, 2025 at 04:13:49PM +0100, Will Deacon wrote:
>
> [...]
>
>>>> In other words, remove arm_spe_pmsevfr_res0() and the two checks that
>>>> use it in arm_spe_pmu_event_init(). If userspace tries to filter events
>>>> that aren't implemented, then it gets to keep the pieces.
>>>
>>> Then the question is: what information should be exposed to userspace
>>> so that tools can decide which events are valid?
>>>
>>> I would suggest to expose a new entry, "caps/version", to indicate the
>>> SPE version number. Tools can use this to apply the appropriate event
>>> validation. Please let me know if this works for you.
>>
>> I thought userspace typically had midr-based json files to figure this
>> stuff out?
>
> Yes, the perf tool records the CPU MIDR in the metadata.
>
> However, I deliberately tried to avoid relying on this approach,
> because the perf would then need to maintain a mapping between:
>
> MIDR -> Arm SPE version -> Events
>
> Given the large number of CPU variants, maintaining this relationship
> between CPU ID and SPE version, and subsequently mapping it to supported
> events, would be quite complex. Additional effort would be required each
> time a new CPU variant is introduced.
>
>> The supported events aren't probe-able afaict so I don't
>> think the driver can help.
>
If the RAZ/WI for not implemented is guaranteed, can we not discover the
supported filter by writing all ones and reading back what stuck?
> Although the events are not probe-able, some events are specific to
> certain Arm SPE versions. For example, E[23]:
>
> | Data snooped.
> | When FEAT_SPEv1p4 is implemented
>
> With SPE version information, the perf tool can decode E[23] == 0 as:
>
> "No snooping" for SPEv4
>
> "No available information for snooping" for earlier SPE versions
>
> Without SPE version information, it's impossible to distinguish
> between these two cases.
>
> Thanks,
> Leo
I think Leo has a point that some of these shouldn't require any MIDR
mappings, and if we're using a filter for some builtin part of the Perf
tool then it would be good to have that work everywhere, rather than
having to update for every new CPU.
We're already publishing the SPE version in dmesg so if we decide to not
publish the filters we'd probably go and try to parse that instead. At
that point maybe we should publish the SPE version in sysfs properly. At
least that's scalable unlike having to update the filters all the time.
James
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 01/14] drivers/perf: arm_spe: Expose event filter
2025-07-15 11:15 ` James Clark
@ 2025-07-17 11:43 ` Will Deacon
0 siblings, 0 replies; 21+ messages in thread
From: Will Deacon @ 2025-07-17 11:43 UTC (permalink / raw)
To: James Clark
Cc: Leo Yan, Mark Rutland, Arnaldo Carvalho de Melo, Namhyung Kim,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
German Gomez, Ali Saidi, Arnaldo Carvalho de Melo,
linux-arm-kernel, linux-perf-users
On Tue, Jul 15, 2025 at 12:15:07PM +0100, James Clark wrote:
> On 14/07/2025 4:42 pm, Leo Yan wrote:
> > On Mon, Jul 14, 2025 at 04:13:49PM +0100, Will Deacon wrote:
> >
> > [...]
> >
> > > > > In other words, remove arm_spe_pmsevfr_res0() and the two checks that
> > > > > use it in arm_spe_pmu_event_init(). If userspace tries to filter events
> > > > > that aren't implemented, then it gets to keep the pieces.
> > > >
> > > > Then the question is: what information should be exposed to userspace
> > > > so that tools can decide which events are valid?
> > > >
> > > > I would suggest to expose a new entry, "caps/version", to indicate the
> > > > SPE version number. Tools can use this to apply the appropriate event
> > > > validation. Please let me know if this works for you.
> > >
> > > I thought userspace typically had midr-based json files to figure this
> > > stuff out?
> >
> > Yes, the perf tool records the CPU MIDR in the metadata.
> >
> > However, I deliberately tried to avoid relying on this approach,
> > because the perf would then need to maintain a mapping between:
> >
> > MIDR -> Arm SPE version -> Events
> >
> > Given the large number of CPU variants, maintaining this relationship
> > between CPU ID and SPE version, and subsequently mapping it to supported
> > events, would be quite complex. Additional effort would be required each
> > time a new CPU variant is introduced.
> >
> > > The supported events aren't probe-able afaict so I don't
> > > think the driver can help.
> >
>
> If the RAZ/WI for not implemented is guaranteed, can we not discover the
> supported filter by writing all ones and reading back what stuck?
That might be the best option.
Looking back at older versions of the architecture, unallocated fields
in PMSEVFR_EL1 were RES0 in revision D.a of the Arm ARM but everything
became RAZ/WI in revision D.b, so the whole reason we were maintaining
these masks in the driver is no longer relevant.
As you say, we can just probe the thing and report the result back to
userspace.
Will
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2025-07-17 11:43 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-07 13:39 [PATCH v3 00/14] perf arm-spe: Support new events in FEAT_SPEv1p4 Leo Yan
2025-07-07 13:39 ` [PATCH v3 01/14] drivers/perf: arm_spe: Expose event filter Leo Yan
2025-07-14 13:07 ` Will Deacon
2025-07-14 15:09 ` Leo Yan
2025-07-14 15:13 ` Will Deacon
2025-07-14 15:42 ` Leo Yan
2025-07-15 11:15 ` James Clark
2025-07-17 11:43 ` Will Deacon
2025-07-07 13:39 ` [PATCH v3 02/14] perf arm_spe: Correct setting remote access Leo Yan
2025-07-07 13:39 ` [PATCH v3 03/14] perf arm_spe: Correct memory level for " Leo Yan
2025-07-07 13:39 ` [PATCH v3 04/14] perf arm_spe: Use full type for data_src Leo Yan
2025-07-07 13:39 ` [PATCH v3 05/14] perf arm_spe: Directly propagate raw event Leo Yan
2025-07-07 13:39 ` [PATCH v3 06/14] perf arm_spe: Decode event types for new features Leo Yan
2025-07-07 13:39 ` [PATCH v3 07/14] perf arm_spe: Add "event_filter" entry in meta data Leo Yan
2025-07-07 13:39 ` [PATCH v3 08/14] perf arm_spe: Refine memory level filling Leo Yan
2025-07-07 13:39 ` [PATCH v3 09/14] perf arm_spe: Separate setting of memory levels for loads and stores Leo Yan
2025-07-07 13:39 ` [PATCH v3 10/14] perf arm_spe: Fill memory levels for FEAT_SPEv1p4 Leo Yan
2025-07-07 13:39 ` [PATCH v3 11/14] perf arm_spe: Improve CPU number retrieving in per-thread mode Leo Yan
2025-07-07 13:39 ` [PATCH v3 12/14] perf arm_spe: Refactor arm_spe__get_metadata_by_cpu() Leo Yan
2025-07-07 13:39 ` [PATCH v3 13/14] perf arm_spe: Set HITM flag Leo Yan
2025-07-07 13:39 ` [PATCH v3 14/14] perf arm_spe: Allow parsing both data source and events Leo Yan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).