* [PATCH RESEND v4 0/8] perf arm_spe: Extend operations
@ 2026-01-06 12:07 Leo Yan
2026-01-06 12:07 ` [PATCH RESEND v4 1/8] perf/uapi: Extend data source fields Leo Yan
` (7 more replies)
0 siblings, 8 replies; 10+ messages in thread
From: Leo Yan @ 2026-01-06 12:07 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Jiri Olsa, Ian Rogers, Adrian Hunter, James Clark,
Mark Rutland
Cc: Arnaldo Carvalho de Melo, linux-perf-users, linux-arm-kernel,
linux-kernel, Leo Yan
This series sets the operation info into sample's data source and
SIMD flag, and updated the document.
No changes compared to the previous version; this version is only rebased
onto the latest perf-tools-next branch.
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
Changes in v4 (resend):
- Updated for Ian and James' review tags.
- Link to v4: https://lore.kernel.org/r/20260106-perf_support_arm_spev1-3-v4-0-bb2d143b3860@arm.com
Changes in v4:
- Updated for Ian and James' review tags.
- Rebased on the latest perf-tools-next branch.
- Link to v3: https://lore.kernel.org/r/20251112-perf_support_arm_spev1-3-v3-0-e63c9829f9d9@arm.com
Changes in v3:
- Rebased on the latest perf-tools-next branch.
- Link to v2: https://lore.kernel.org/r/20251017-perf_support_arm_spev1-3-v2-0-2d41e4746e1b@arm.com
Changes in v2:
- Refined to use enums for 2nd operation types. (James)
- Avoided adjustment bit positions for operations. (James)
- Used enum for extended operation type in uapi header and defined
individual bit field for operation details in uaip header. (James)
- Refined SIMD flag definitions. (James)
- Extracted a separate commit for updating tool's header. (James/Arnaldo)
- Minor improvement for printing memory events.
- Rebased on the latest perf-tools-next branch.
- Link to v1: https://lore.kernel.org/r/20250929-perf_support_arm_spev1-3-v1-0-1150b3c83857@arm.com
---
Leo Yan (8):
perf/uapi: Extend data source fields
tools/include: Sync uapi/linux/perf.h with the kernel sources
perf mem: Print extended fields
perf arm_spe: Set extended fields in data source
perf sort: Support sort ASE and SME
perf sort: Sort disabled and full predicated flags
perf report: Update document for SIMD flags
perf arm_spe: Improve SIMD flags setting
include/uapi/linux/perf_event.h | 32 +++++++++++++++-
tools/include/uapi/linux/perf_event.h | 32 +++++++++++++++-
tools/perf/Documentation/perf-report.txt | 5 ++-
tools/perf/util/arm-spe.c | 56 ++++++++++++++++++++++++---
tools/perf/util/mem-events.c | 66 +++++++++++++++++++++++++++++---
tools/perf/util/sample.h | 21 +++++++---
tools/perf/util/sort.c | 21 +++++++---
7 files changed, 205 insertions(+), 28 deletions(-)
---
base-commit: cbd41c6d4c26c161a2b0e70ad411d3885ff13507
change-id: 20250820-perf_support_arm_spev1-3-b6efd6fc77b2
Best regards,
--
Leo Yan <leo.yan@arm.com>
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH RESEND v4 1/8] perf/uapi: Extend data source fields
2026-01-06 12:07 [PATCH RESEND v4 0/8] perf arm_spe: Extend operations Leo Yan
@ 2026-01-06 12:07 ` Leo Yan
2026-01-06 12:07 ` [PATCH RESEND v4 2/8] tools/include: Sync uapi/linux/perf.h with the kernel sources Leo Yan
` (6 subsequent siblings)
7 siblings, 0 replies; 10+ messages in thread
From: Leo Yan @ 2026-01-06 12:07 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Jiri Olsa, Ian Rogers, Adrian Hunter, James Clark,
Mark Rutland
Cc: Arnaldo Carvalho de Melo, linux-perf-users, linux-arm-kernel,
linux-kernel, Leo Yan
Arm CPUs introduce several new types of memory operations, like MTE tag
accessing, system register access for nested virtualization, memcpy &
memset, and Guarded Control Stack (GCS).
For memory operation details, Arm SPE provides information like data
(parallel) processing, floating-point, predicated, atomic, exclusive,
acquire/release, gather/scatter, and conditional.
This commit introduces a field 'mem_op_ext' for extended operation type.
The extended operation type can be combined with the existed operation
type to express a memory type, for examples, a PERF_MEM_OP_GCS type can
be set along with PERF_MEM_OP_LOAD to present a load operation for
GCS register access.
Bit fields are also added to represent detailed operation attributes.
Reviewed-by: James Clark <james.clark@linaro.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
include/uapi/linux/perf_event.h | 32 ++++++++++++++++++++++++++++++--
1 file changed, 30 insertions(+), 2 deletions(-)
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index c44a8fb3e4181c91a1e6e3a40e23fcf1de421af3..3d2c5ee9282efc4a2310f554443082f1d0027889 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -1330,14 +1330,32 @@ union perf_mem_data_src {
mem_snoopx : 2, /* Snoop mode, ext */
mem_blk : 3, /* Access blocked */
mem_hops : 3, /* Hop level */
- mem_rsvd : 18;
+ mem_op_ext : 4, /* Extended type of opcode */
+ mem_dp : 1, /* Data processing */
+ mem_fp : 1, /* Floating-point */
+ mem_pred : 1, /* Predicated */
+ mem_atomic : 1, /* Atomic operation */
+ mem_excl : 1, /* Exclusive */
+ mem_ar : 1, /* Acquire/release */
+ mem_sg : 1, /* Scatter/Gather */
+ mem_cond : 1, /* Conditional */
+ mem_rsvd : 6;
};
};
#elif defined(__BIG_ENDIAN_BITFIELD)
union perf_mem_data_src {
__u64 val;
struct {
- __u64 mem_rsvd : 18,
+ __u64 mem_rsvd : 6,
+ mem_cond : 1, /* Conditional */
+ mem_sg : 1, /* Scatter/Gather */
+ mem_ar : 1, /* Acquire/release */
+ mem_excl : 1, /* Exclusive */
+ mem_atomic : 1, /* Atomic operation */
+ mem_pred : 1, /* Predicated */
+ mem_fp : 1, /* Floating-point */
+ mem_dp : 1, /* Data processing */
+ mem_op_ext : 4, /* Extended type of opcode */
mem_hops : 3, /* Hop level */
mem_blk : 3, /* Access blocked */
mem_snoopx : 2, /* Snoop mode, ext */
@@ -1447,6 +1465,16 @@ union perf_mem_data_src {
/* 5-7 available */
#define PERF_MEM_HOPS_SHIFT 43
+/* Extended type of memory opcode: */
+#define PERF_MEM_EXT_OP_NA 0x0 /* Not available */
+#define PERF_MEM_EXT_OP_MTE_TAG 0x1 /* MTE tag */
+#define PERF_MEM_EXT_OP_NESTED_VIRT 0x2 /* Nested virtualization */
+#define PERF_MEM_EXT_OP_MEMCPY 0x3 /* Memory copy */
+#define PERF_MEM_EXT_OP_MEMSET 0x4 /* Memory set */
+#define PERF_MEM_EXT_OP_SIMD 0x5 /* SIMD */
+#define PERF_MEM_EXT_OP_GCS 0x6 /* Guarded Control Stack */
+#define PERF_MEM_EXT_OP_SHIFT 46
+
#define PERF_MEM_S(a, s) \
(((__u64)PERF_MEM_##a##_##s) << PERF_MEM_##a##_SHIFT)
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH RESEND v4 2/8] tools/include: Sync uapi/linux/perf.h with the kernel sources
2026-01-06 12:07 [PATCH RESEND v4 0/8] perf arm_spe: Extend operations Leo Yan
2026-01-06 12:07 ` [PATCH RESEND v4 1/8] perf/uapi: Extend data source fields Leo Yan
@ 2026-01-06 12:07 ` Leo Yan
2026-01-16 15:27 ` Leo Yan
2026-01-06 12:07 ` [PATCH RESEND v4 3/8] perf mem: Print extended fields Leo Yan
` (5 subsequent siblings)
7 siblings, 1 reply; 10+ messages in thread
From: Leo Yan @ 2026-01-06 12:07 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Jiri Olsa, Ian Rogers, Adrian Hunter, James Clark,
Mark Rutland
Cc: Arnaldo Carvalho de Melo, linux-perf-users, linux-arm-kernel,
linux-kernel, Leo Yan
Sync for extended memory operation bit fields.
Reviewed-by: James Clark <james.clark@linaro.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
tools/include/uapi/linux/perf_event.h | 32 ++++++++++++++++++++++++++++++--
1 file changed, 30 insertions(+), 2 deletions(-)
diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index c44a8fb3e4181c91a1e6e3a40e23fcf1de421af3..3d2c5ee9282efc4a2310f554443082f1d0027889 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -1330,14 +1330,32 @@ union perf_mem_data_src {
mem_snoopx : 2, /* Snoop mode, ext */
mem_blk : 3, /* Access blocked */
mem_hops : 3, /* Hop level */
- mem_rsvd : 18;
+ mem_op_ext : 4, /* Extended type of opcode */
+ mem_dp : 1, /* Data processing */
+ mem_fp : 1, /* Floating-point */
+ mem_pred : 1, /* Predicated */
+ mem_atomic : 1, /* Atomic operation */
+ mem_excl : 1, /* Exclusive */
+ mem_ar : 1, /* Acquire/release */
+ mem_sg : 1, /* Scatter/Gather */
+ mem_cond : 1, /* Conditional */
+ mem_rsvd : 6;
};
};
#elif defined(__BIG_ENDIAN_BITFIELD)
union perf_mem_data_src {
__u64 val;
struct {
- __u64 mem_rsvd : 18,
+ __u64 mem_rsvd : 6,
+ mem_cond : 1, /* Conditional */
+ mem_sg : 1, /* Scatter/Gather */
+ mem_ar : 1, /* Acquire/release */
+ mem_excl : 1, /* Exclusive */
+ mem_atomic : 1, /* Atomic operation */
+ mem_pred : 1, /* Predicated */
+ mem_fp : 1, /* Floating-point */
+ mem_dp : 1, /* Data processing */
+ mem_op_ext : 4, /* Extended type of opcode */
mem_hops : 3, /* Hop level */
mem_blk : 3, /* Access blocked */
mem_snoopx : 2, /* Snoop mode, ext */
@@ -1447,6 +1465,16 @@ union perf_mem_data_src {
/* 5-7 available */
#define PERF_MEM_HOPS_SHIFT 43
+/* Extended type of memory opcode: */
+#define PERF_MEM_EXT_OP_NA 0x0 /* Not available */
+#define PERF_MEM_EXT_OP_MTE_TAG 0x1 /* MTE tag */
+#define PERF_MEM_EXT_OP_NESTED_VIRT 0x2 /* Nested virtualization */
+#define PERF_MEM_EXT_OP_MEMCPY 0x3 /* Memory copy */
+#define PERF_MEM_EXT_OP_MEMSET 0x4 /* Memory set */
+#define PERF_MEM_EXT_OP_SIMD 0x5 /* SIMD */
+#define PERF_MEM_EXT_OP_GCS 0x6 /* Guarded Control Stack */
+#define PERF_MEM_EXT_OP_SHIFT 46
+
#define PERF_MEM_S(a, s) \
(((__u64)PERF_MEM_##a##_##s) << PERF_MEM_##a##_SHIFT)
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH RESEND v4 3/8] perf mem: Print extended fields
2026-01-06 12:07 [PATCH RESEND v4 0/8] perf arm_spe: Extend operations Leo Yan
2026-01-06 12:07 ` [PATCH RESEND v4 1/8] perf/uapi: Extend data source fields Leo Yan
2026-01-06 12:07 ` [PATCH RESEND v4 2/8] tools/include: Sync uapi/linux/perf.h with the kernel sources Leo Yan
@ 2026-01-06 12:07 ` Leo Yan
2026-01-06 12:07 ` [PATCH RESEND v4 4/8] perf arm_spe: Set extended fields in data source Leo Yan
` (4 subsequent siblings)
7 siblings, 0 replies; 10+ messages in thread
From: Leo Yan @ 2026-01-06 12:07 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Jiri Olsa, Ian Rogers, Adrian Hunter, James Clark,
Mark Rutland
Cc: Arnaldo Carvalho de Melo, linux-perf-users, linux-arm-kernel,
linux-kernel, Leo Yan
Print the extended operation types and affiliate info.
Reviewed-by: James Clark <james.clark@linaro.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
tools/perf/util/mem-events.c | 66 ++++++++++++++++++++++++++++++++++++++++----
1 file changed, 60 insertions(+), 6 deletions(-)
diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index 0b49fce251fcc18417cb1037075b3e406a3e6481..19d46d04de182cadbd6dfad23a7ab987cc4dcaae 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -416,11 +416,15 @@ static const char * const mem_hops[] = {
static int perf_mem__op_scnprintf(char *out, size_t sz, const struct mem_info *mem_info)
{
- u64 op = PERF_MEM_LOCK_NA;
+ union perf_mem_data_src data_src;
+ u64 op = PERF_MEM_OP_NA, ext_op = 0;
int l;
- if (mem_info)
- op = mem_info__const_data_src(mem_info)->mem_op;
+ if (mem_info) {
+ data_src = *mem_info__const_data_src(mem_info);
+ op = data_src.mem_op;
+ ext_op = data_src.mem_op_ext;
+ }
if (op & PERF_MEM_OP_NA)
l = scnprintf(out, sz, "N/A");
@@ -435,6 +439,19 @@ static int perf_mem__op_scnprintf(char *out, size_t sz, const struct mem_info *m
else
l = scnprintf(out, sz, "No");
+ if (ext_op == PERF_MEM_EXT_OP_MTE_TAG)
+ l += scnprintf(out + l, sz - l, " MTE");
+ else if (ext_op == PERF_MEM_EXT_OP_NESTED_VIRT)
+ l += scnprintf(out + l, sz - l, " NV");
+ else if (ext_op == PERF_MEM_EXT_OP_MEMCPY)
+ l += scnprintf(out + l, sz - l, " MEMCPY");
+ else if (ext_op == PERF_MEM_EXT_OP_MEMSET)
+ l += scnprintf(out + l, sz - l, " MEMSET");
+ else if (ext_op == PERF_MEM_EXT_OP_SIMD)
+ l += scnprintf(out + l, sz - l, " SIMD");
+ else if (ext_op == PERF_MEM_EXT_OP_GCS)
+ l += scnprintf(out + l, sz - l, " GCS");
+
return l;
}
@@ -585,9 +602,6 @@ int perf_mem__blk_scnprintf(char *out, size_t sz, const struct mem_info *mem_inf
size_t l = 0;
u64 mask = PERF_MEM_BLK_NA;
- sz -= 1; /* -1 for null termination */
- out[0] = '\0';
-
if (mem_info)
mask = mem_info__const_data_src(mem_info)->mem_blk;
@@ -603,6 +617,44 @@ int perf_mem__blk_scnprintf(char *out, size_t sz, const struct mem_info *mem_inf
return l;
}
+static int perf_mem__aff_scnprintf(char *out, size_t sz,
+ const struct mem_info *mem_info)
+{
+ union perf_mem_data_src data_src;
+ size_t l = 0;
+
+ sz -= 1; /* -1 for null termination */
+ out[0] = '\0';
+
+ if (!mem_info)
+ goto out;
+
+ data_src = *mem_info__const_data_src(mem_info);
+
+ if (data_src.mem_dp)
+ l += scnprintf(out + l, sz - l, " DP");
+ if (data_src.mem_fp)
+ l += scnprintf(out + l, sz - l, " FP");
+ if (data_src.mem_pred)
+ l += scnprintf(out + l, sz - l, " PRED");
+ if (data_src.mem_atomic)
+ l += scnprintf(out + l, sz - l, " ATOMIC");
+ if (data_src.mem_excl)
+ l += scnprintf(out + l, sz - l, " EX");
+ if (data_src.mem_ar)
+ l += scnprintf(out + l, sz - l, " AR");
+ if (data_src.mem_sg)
+ l += scnprintf(out + l, sz - l, " SG");
+ if (data_src.mem_cond)
+ l += scnprintf(out + l, sz - l, " COND");
+
+out:
+ if (!l)
+ l += scnprintf(out + l, sz - l, " N/A");
+
+ return l;
+}
+
int perf_script__meminfo_scnprintf(char *out, size_t sz, const struct mem_info *mem_info)
{
int i = 0;
@@ -619,6 +671,8 @@ int perf_script__meminfo_scnprintf(char *out, size_t sz, const struct mem_info *
i += perf_mem__lck_scnprintf(out + i, sz - i, mem_info);
i += scnprintf(out + i, sz - i, "|BLK ");
i += perf_mem__blk_scnprintf(out + i, sz - i, mem_info);
+ i += scnprintf(out + i, sz - i, "|AFF");
+ i += perf_mem__aff_scnprintf(out + i, sz - i, mem_info);
return i;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH RESEND v4 4/8] perf arm_spe: Set extended fields in data source
2026-01-06 12:07 [PATCH RESEND v4 0/8] perf arm_spe: Extend operations Leo Yan
` (2 preceding siblings ...)
2026-01-06 12:07 ` [PATCH RESEND v4 3/8] perf mem: Print extended fields Leo Yan
@ 2026-01-06 12:07 ` Leo Yan
2026-01-06 12:07 ` [PATCH RESEND v4 5/8] perf sort: Support sort ASE and SME Leo Yan
` (3 subsequent siblings)
7 siblings, 0 replies; 10+ messages in thread
From: Leo Yan @ 2026-01-06 12:07 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Jiri Olsa, Ian Rogers, Adrian Hunter, James Clark,
Mark Rutland
Cc: Arnaldo Carvalho de Melo, linux-perf-users, linux-arm-kernel,
linux-kernel, Leo Yan
Set extended operation type and affiliate info in the data source.
Before:
perf script -F,dso,sym,data_src
sve-test 6516696.714341: 288100144 |OP STORE|LVL L1 hit|SNP None|TLB Walker hit|LCK No|BLK N/A|AFF N/A
sve-test 6516696.714341: 288100144 |OP STORE|LVL L1 hit|SNP None|TLB Walker hit|LCK No|BLK N/A|AFF N/A
sve-test 6516696.714341: 288100144 |OP STORE|LVL L1 hit|SNP None|TLB Walker hit|LCK No|BLK N/A|AFF N/A
sve-test 6516696.714344: 288800142 |OP LOAD|LVL L1 hit|SNP HitM|TLB Walker hit|LCK No|BLK N/A|AFF N/A
sve-test 6516696.714344: 288800142 |OP LOAD|LVL L1 hit|SNP HitM|TLB Walker hit|LCK No|BLK N/A|AFF N/A
After:
perf script -F,dso,sym,data_src
sve-test 6516696.714341: 444000288100144 |OP STORE SIMD|LVL L1 hit|SNP None|TLB Walker hit|LCK No|BLK N/A|AFF PRED SG
sve-test 6516696.714341: 444000288100144 |OP STORE SIMD|LVL L1 hit|SNP None|TLB Walker hit|LCK No|BLK N/A|AFF PRED SG
sve-test 6516696.714341: 444000288100144 |OP STORE SIMD|LVL L1 hit|SNP None|TLB Walker hit|LCK No|BLK N/A|AFF PRED SG
sve-test 6516696.714344: 288800142 |OP LOAD|LVL L1 hit|SNP HitM|TLB Walker hit|LCK No|BLK N/A|AFF N/A
sve-test 6516696.714344: 288800142 |OP LOAD|LVL L1 hit|SNP HitM|TLB Walker hit|LCK No|BLK N/A|AFF N/A
Reviewed-by: James Clark <james.clark@linaro.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
tools/perf/util/arm-spe.c | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index dc19e72258f30dd6d89dafbb70cc5f7b5c485589..8e9004213e163a46fcacecb4ee32e199a0ec50b2 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -1006,6 +1006,36 @@ arm_spe__synth_data_source(struct arm_spe_queue *speq,
else
data_src.mem_op = PERF_MEM_OP_NA;
+ if (record->op & ARM_SPE_OP_MTE_TAG)
+ data_src.mem_op_ext = PERF_MEM_EXT_OP_MTE_TAG;
+ else if (record->op & ARM_SPE_OP_NV_SYSREG)
+ data_src.mem_op_ext = PERF_MEM_EXT_OP_NESTED_VIRT;
+ else if (record->op & ARM_SPE_OP_MEMCPY)
+ data_src.mem_op_ext = PERF_MEM_EXT_OP_MEMCPY;
+ else if (record->op & ARM_SPE_OP_MEMSET)
+ data_src.mem_op_ext = PERF_MEM_EXT_OP_MEMSET;
+ else if (record->op & ARM_SPE_OP_GCS)
+ data_src.mem_op_ext = PERF_MEM_EXT_OP_GCS;
+ else if (is_simd_op(record->op))
+ data_src.mem_op_ext = PERF_MEM_EXT_OP_SIMD;
+
+ if (record->op & ARM_SPE_OP_DP)
+ data_src.mem_dp = 1;
+ if (record->op & ARM_SPE_OP_FP)
+ data_src.mem_fp = 1;
+ if (record->op & ARM_SPE_OP_PRED)
+ data_src.mem_pred = 1;
+ if (record->op & ARM_SPE_OP_ATOMIC)
+ data_src.mem_atomic = 1;
+ if (record->op & ARM_SPE_OP_EXCL)
+ data_src.mem_excl = 1;
+ if (record->op & ARM_SPE_OP_AR)
+ data_src.mem_ar = 1;
+ if (record->op & ARM_SPE_OP_SG)
+ data_src.mem_sg = 1;
+ if (record->op & ARM_SPE_OP_COND)
+ data_src.mem_cond = 1;
+
arm_spe__synth_ds(speq, record, &data_src);
arm_spe__synth_memory_level(speq, record, &data_src);
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH RESEND v4 5/8] perf sort: Support sort ASE and SME
2026-01-06 12:07 [PATCH RESEND v4 0/8] perf arm_spe: Extend operations Leo Yan
` (3 preceding siblings ...)
2026-01-06 12:07 ` [PATCH RESEND v4 4/8] perf arm_spe: Set extended fields in data source Leo Yan
@ 2026-01-06 12:07 ` Leo Yan
2026-01-06 12:07 ` [PATCH RESEND v4 6/8] perf sort: Sort disabled and full predicated flags Leo Yan
` (2 subsequent siblings)
7 siblings, 0 replies; 10+ messages in thread
From: Leo Yan @ 2026-01-06 12:07 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Jiri Olsa, Ian Rogers, Adrian Hunter, James Clark,
Mark Rutland
Cc: Arnaldo Carvalho de Melo, linux-perf-users, linux-arm-kernel,
linux-kernel, Leo Yan
Support sort Advance SIMD extension (ASE) and SME.
Reviewed-by: James Clark <james.clark@linaro.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
tools/perf/util/sample.h | 12 +++++++++---
tools/perf/util/sort.c | 6 +++++-
2 files changed, 14 insertions(+), 4 deletions(-)
diff --git a/tools/perf/util/sample.h b/tools/perf/util/sample.h
index a8307b20a9ea80668deecf65e316ab6036afbfeb..504256474f22fa8ec647429d182a6d04f8d05c39 100644
--- a/tools/perf/util/sample.h
+++ b/tools/perf/util/sample.h
@@ -67,12 +67,18 @@ struct aux_sample {
};
struct simd_flags {
- u8 arch:1, /* architecture (isa) */
- pred:2; /* predication */
+ u8 arch: 2, /* architecture (isa) */
+ pred: 2, /* predication */
+ resv: 4; /* reserved */
};
/* simd architecture flags */
-#define SIMD_OP_FLAGS_ARCH_SVE 0x01 /* ARM SVE */
+enum simd_op_flags {
+ SIMD_OP_FLAGS_ARCH_NONE = 0x0, /* No SIMD operation */
+ SIMD_OP_FLAGS_ARCH_SVE, /* Arm SVE */
+ SIMD_OP_FLAGS_ARCH_SME, /* Arm SME */
+ SIMD_OP_FLAGS_ARCH_ASE, /* Arm Advanced SIMD */
+};
/* simd predicate flags */
#define SIMD_OP_FLAGS_PRED_PARTIAL 0x01 /* partial predicate */
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index f963d61ac166f9b3f16069b07aa4b0dacfb65c02..e8f793eed33f78b7688b3f949e8254cbb64c3709 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -193,8 +193,12 @@ static const char *hist_entry__get_simd_name(struct simd_flags *simd_flags)
{
u64 arch = simd_flags->arch;
- if (arch & SIMD_OP_FLAGS_ARCH_SVE)
+ if (arch == SIMD_OP_FLAGS_ARCH_SVE)
return "SVE";
+ else if (arch == SIMD_OP_FLAGS_ARCH_SME)
+ return "SME";
+ else if (arch == SIMD_OP_FLAGS_ARCH_ASE)
+ return "ASE";
else
return "n/a";
}
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH RESEND v4 6/8] perf sort: Sort disabled and full predicated flags
2026-01-06 12:07 [PATCH RESEND v4 0/8] perf arm_spe: Extend operations Leo Yan
` (4 preceding siblings ...)
2026-01-06 12:07 ` [PATCH RESEND v4 5/8] perf sort: Support sort ASE and SME Leo Yan
@ 2026-01-06 12:07 ` Leo Yan
2026-01-06 12:07 ` [PATCH RESEND v4 7/8] perf report: Update document for SIMD flags Leo Yan
2026-01-06 12:07 ` [PATCH RESEND v4 8/8] perf arm_spe: Improve SIMD flags setting Leo Yan
7 siblings, 0 replies; 10+ messages in thread
From: Leo Yan @ 2026-01-06 12:07 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Jiri Olsa, Ian Rogers, Adrian Hunter, James Clark,
Mark Rutland
Cc: Arnaldo Carvalho de Melo, linux-perf-users, linux-arm-kernel,
linux-kernel, Leo Yan
According to the Arm ARM (ARM DDI 0487, L.a), section D18.2.6
"Events packet", apart from the empty predicate and partial
predicates, an SVE or SME operation can be predicate-disabled
or full predicated.
To provide complete results, introduce two predicate types for
these cases.
Reviewed-by: James Clark <james.clark@linaro.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
tools/perf/util/sample.h | 13 +++++++++----
tools/perf/util/sort.c | 15 ++++++++++-----
2 files changed, 19 insertions(+), 9 deletions(-)
diff --git a/tools/perf/util/sample.h b/tools/perf/util/sample.h
index 504256474f22fa8ec647429d182a6d04f8d05c39..c0ba8dc6c055597e6b01c28eab7178d5c2b3c3ed 100644
--- a/tools/perf/util/sample.h
+++ b/tools/perf/util/sample.h
@@ -68,8 +68,8 @@ struct aux_sample {
struct simd_flags {
u8 arch: 2, /* architecture (isa) */
- pred: 2, /* predication */
- resv: 4; /* reserved */
+ pred: 3, /* predication */
+ resv: 3; /* reserved */
};
/* simd architecture flags */
@@ -81,8 +81,13 @@ enum simd_op_flags {
};
/* simd predicate flags */
-#define SIMD_OP_FLAGS_PRED_PARTIAL 0x01 /* partial predicate */
-#define SIMD_OP_FLAGS_PRED_EMPTY 0x02 /* empty predicate */
+enum simd_pred_flags {
+ SIMD_OP_FLAGS_PRED_NONE = 0x0, /* Not available */
+ SIMD_OP_FLAGS_PRED_PARTIAL, /* partial predicate */
+ SIMD_OP_FLAGS_PRED_EMPTY, /* empty predicate */
+ SIMD_OP_FLAGS_PRED_FULL, /* full predicate */
+ SIMD_OP_FLAGS_PRED_DISABLED, /* disabled predicate */
+};
struct perf_sample {
u64 ip;
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index e8f793eed33f78b7688b3f949e8254cbb64c3709..72ad35559bc2ff89ba65208bb18d4db3224be034 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -207,18 +207,23 @@ static int hist_entry__simd_snprintf(struct hist_entry *he, char *bf,
size_t size, unsigned int width __maybe_unused)
{
const char *name;
+ const char *pred_str = ".";
if (!he->simd_flags.arch)
return repsep_snprintf(bf, size, "");
name = hist_entry__get_simd_name(&he->simd_flags);
- if (he->simd_flags.pred & SIMD_OP_FLAGS_PRED_EMPTY)
- return repsep_snprintf(bf, size, "[e] %s", name);
- else if (he->simd_flags.pred & SIMD_OP_FLAGS_PRED_PARTIAL)
- return repsep_snprintf(bf, size, "[p] %s", name);
+ if (he->simd_flags.pred == SIMD_OP_FLAGS_PRED_EMPTY)
+ pred_str = "e";
+ else if (he->simd_flags.pred == SIMD_OP_FLAGS_PRED_PARTIAL)
+ pred_str = "p";
+ else if (he->simd_flags.pred == SIMD_OP_FLAGS_PRED_DISABLED)
+ pred_str = "d";
+ else if (he->simd_flags.pred == SIMD_OP_FLAGS_PRED_FULL)
+ pred_str = "f";
- return repsep_snprintf(bf, size, "[.] %s", name);
+ return repsep_snprintf(bf, size, "[%s] %s", pred_str, name);
}
struct sort_entry sort_simd = {
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH RESEND v4 7/8] perf report: Update document for SIMD flags
2026-01-06 12:07 [PATCH RESEND v4 0/8] perf arm_spe: Extend operations Leo Yan
` (5 preceding siblings ...)
2026-01-06 12:07 ` [PATCH RESEND v4 6/8] perf sort: Sort disabled and full predicated flags Leo Yan
@ 2026-01-06 12:07 ` Leo Yan
2026-01-06 12:07 ` [PATCH RESEND v4 8/8] perf arm_spe: Improve SIMD flags setting Leo Yan
7 siblings, 0 replies; 10+ messages in thread
From: Leo Yan @ 2026-01-06 12:07 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Jiri Olsa, Ian Rogers, Adrian Hunter, James Clark,
Mark Rutland
Cc: Arnaldo Carvalho de Melo, linux-perf-users, linux-arm-kernel,
linux-kernel, Leo Yan
Update SIMD architecture and predicate flags.
Reviewed-by: James Clark <james.clark@linaro.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
tools/perf/Documentation/perf-report.txt | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index acef3ff4178eff66e8f876ae16cdac7b1387f07b..f361081a65dbe9cead539c7cb81d6ed86eb0acc6 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -136,7 +136,10 @@ OPTIONS
- addr: (Full) virtual address of the sampled instruction
- retire_lat: On X86, this reports pipeline stall of this instruction compared
to the previous instruction in cycles. And currently supported only on X86
- - simd: Flags describing a SIMD operation. "e" for empty Arm SVE predicate. "p" for partial Arm SVE predicate
+ - simd: Flags describing a SIMD operation. The architecture type can be Arm's
+ ASE (Advanced SIMD extension), SVE, SME. It provides an extra tag for
+ predicate: "e" for empty predicate, "p" for partial predicate, "d" for
+ predicate disabled, and "f" for full predicate.
- type: Data type of sample memory access.
- typeoff: Offset in the data type of sample memory access.
- symoff: Offset in the symbol.
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH RESEND v4 8/8] perf arm_spe: Improve SIMD flags setting
2026-01-06 12:07 [PATCH RESEND v4 0/8] perf arm_spe: Extend operations Leo Yan
` (6 preceding siblings ...)
2026-01-06 12:07 ` [PATCH RESEND v4 7/8] perf report: Update document for SIMD flags Leo Yan
@ 2026-01-06 12:07 ` Leo Yan
7 siblings, 0 replies; 10+ messages in thread
From: Leo Yan @ 2026-01-06 12:07 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Jiri Olsa, Ian Rogers, Adrian Hunter, James Clark,
Mark Rutland
Cc: Arnaldo Carvalho de Melo, linux-perf-users, linux-arm-kernel,
linux-kernel, Leo Yan
Fill in ASE and SME operations for the SIMD arch field.
Also set the predicate flags for SVE and SME, but differences between
them: SME does not have a predicate flag, so the setting is based on
events. SVE provides a predicate flag to indicate whether the predicate
is disabled, which allows it to be distinguished into four cases: full
predicates, empty predicates, fully predicated, and disabled predicates.
After:
perf report -s +simd
...
0.06% 0.06% sve-test sve-test [.] setz [p] SVE
0.06% 0.06% sve-test [kernel.kallsyms] [k] do_raw_spin_lock
0.06% 0.06% sve-test sve-test [.] getz [p] SVE
0.06% 0.06% sve-test [kernel.kallsyms] [k] timekeeping_advance
0.06% 0.06% sve-test sve-test [.] getz [d] SVE
0.06% 0.06% sve-test [kernel.kallsyms] [k] update_load_avg
0.06% 0.06% sve-test sve-test [.] getz [e] SVE
0.05% 0.05% sve-test sve-test [.] setz [e] SVE
0.05% 0.05% sve-test [kernel.kallsyms] [k] update_curr
0.05% 0.05% sve-test sve-test [.] setz [d] SVE
0.05% 0.05% sve-test [kernel.kallsyms] [k] do_raw_spin_unlock
0.05% 0.05% sve-test [kernel.kallsyms] [k] timekeeping_update_from_shadow.constprop.0
0.05% 0.05% sve-test sve-test [.] getz [f] SVE
0.05% 0.05% sve-test sve-test [.] setz [f] SVE
Reviewed-by: James Clark <james.clark@linaro.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
tools/perf/util/arm-spe.c | 26 ++++++++++++++++++++------
1 file changed, 20 insertions(+), 6 deletions(-)
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 8e9004213e163a46fcacecb4ee32e199a0ec50b2..0a20591d49e821357d3d9ac559874179f1d4f378 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -353,12 +353,26 @@ static struct simd_flags arm_spe__synth_simd_flags(const struct arm_spe_record *
if (record->op & ARM_SPE_OP_SVE)
simd_flags.arch |= SIMD_OP_FLAGS_ARCH_SVE;
-
- if (record->type & ARM_SPE_SVE_PARTIAL_PRED)
- simd_flags.pred |= SIMD_OP_FLAGS_PRED_PARTIAL;
-
- if (record->type & ARM_SPE_SVE_EMPTY_PRED)
- simd_flags.pred |= SIMD_OP_FLAGS_PRED_EMPTY;
+ else if (record->op & ARM_SPE_OP_SME)
+ simd_flags.arch |= SIMD_OP_FLAGS_ARCH_SME;
+ else if (record->op & (ARM_SPE_OP_ASE | ARM_SPE_OP_SIMD_FP))
+ simd_flags.arch |= SIMD_OP_FLAGS_ARCH_ASE;
+
+ if (record->op & ARM_SPE_OP_SVE) {
+ if (!(record->op & ARM_SPE_OP_PRED))
+ simd_flags.pred = SIMD_OP_FLAGS_PRED_DISABLED;
+ else if (record->type & ARM_SPE_SVE_PARTIAL_PRED)
+ simd_flags.pred = SIMD_OP_FLAGS_PRED_PARTIAL;
+ else if (record->type & ARM_SPE_SVE_EMPTY_PRED)
+ simd_flags.pred = SIMD_OP_FLAGS_PRED_EMPTY;
+ else
+ simd_flags.pred = SIMD_OP_FLAGS_PRED_FULL;
+ } else {
+ if (record->type & ARM_SPE_SVE_PARTIAL_PRED)
+ simd_flags.pred = SIMD_OP_FLAGS_PRED_PARTIAL;
+ else if (record->type & ARM_SPE_SVE_EMPTY_PRED)
+ simd_flags.pred = SIMD_OP_FLAGS_PRED_EMPTY;
+ }
return simd_flags;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH RESEND v4 2/8] tools/include: Sync uapi/linux/perf.h with the kernel sources
2026-01-06 12:07 ` [PATCH RESEND v4 2/8] tools/include: Sync uapi/linux/perf.h with the kernel sources Leo Yan
@ 2026-01-16 15:27 ` Leo Yan
0 siblings, 0 replies; 10+ messages in thread
From: Leo Yan @ 2026-01-16 15:27 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Jiri Olsa, Ian Rogers, Adrian Hunter, James Clark,
Mark Rutland
Cc: Arnaldo Carvalho de Melo, linux-perf-users, linux-arm-kernel,
linux-kernel
On Tue, Jan 06, 2026 at 12:07:52PM +0000, Leo Yan wrote:
> Sync for extended memory operation bit fields.
Hi Peter, Ingo,
This patch is important for enabling the Arm SPE feature, I appreciate
if you could give a review; otherwise, the changes in the perf cannot
proceed.
Thanks a lot!
> Reviewed-by: James Clark <james.clark@linaro.org>
> Reviewed-by: Ian Rogers <irogers@google.com>
> Signed-off-by: Leo Yan <leo.yan@arm.com>
> ---
> tools/include/uapi/linux/perf_event.h | 32 ++++++++++++++++++++++++++++++--
> 1 file changed, 30 insertions(+), 2 deletions(-)
>
> diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
> index c44a8fb3e4181c91a1e6e3a40e23fcf1de421af3..3d2c5ee9282efc4a2310f554443082f1d0027889 100644
> --- a/tools/include/uapi/linux/perf_event.h
> +++ b/tools/include/uapi/linux/perf_event.h
> @@ -1330,14 +1330,32 @@ union perf_mem_data_src {
> mem_snoopx : 2, /* Snoop mode, ext */
> mem_blk : 3, /* Access blocked */
> mem_hops : 3, /* Hop level */
> - mem_rsvd : 18;
> + mem_op_ext : 4, /* Extended type of opcode */
> + mem_dp : 1, /* Data processing */
> + mem_fp : 1, /* Floating-point */
> + mem_pred : 1, /* Predicated */
> + mem_atomic : 1, /* Atomic operation */
> + mem_excl : 1, /* Exclusive */
> + mem_ar : 1, /* Acquire/release */
> + mem_sg : 1, /* Scatter/Gather */
> + mem_cond : 1, /* Conditional */
> + mem_rsvd : 6;
> };
> };
> #elif defined(__BIG_ENDIAN_BITFIELD)
> union perf_mem_data_src {
> __u64 val;
> struct {
> - __u64 mem_rsvd : 18,
> + __u64 mem_rsvd : 6,
> + mem_cond : 1, /* Conditional */
> + mem_sg : 1, /* Scatter/Gather */
> + mem_ar : 1, /* Acquire/release */
> + mem_excl : 1, /* Exclusive */
> + mem_atomic : 1, /* Atomic operation */
> + mem_pred : 1, /* Predicated */
> + mem_fp : 1, /* Floating-point */
> + mem_dp : 1, /* Data processing */
> + mem_op_ext : 4, /* Extended type of opcode */
> mem_hops : 3, /* Hop level */
> mem_blk : 3, /* Access blocked */
> mem_snoopx : 2, /* Snoop mode, ext */
> @@ -1447,6 +1465,16 @@ union perf_mem_data_src {
> /* 5-7 available */
> #define PERF_MEM_HOPS_SHIFT 43
>
> +/* Extended type of memory opcode: */
> +#define PERF_MEM_EXT_OP_NA 0x0 /* Not available */
> +#define PERF_MEM_EXT_OP_MTE_TAG 0x1 /* MTE tag */
> +#define PERF_MEM_EXT_OP_NESTED_VIRT 0x2 /* Nested virtualization */
> +#define PERF_MEM_EXT_OP_MEMCPY 0x3 /* Memory copy */
> +#define PERF_MEM_EXT_OP_MEMSET 0x4 /* Memory set */
> +#define PERF_MEM_EXT_OP_SIMD 0x5 /* SIMD */
> +#define PERF_MEM_EXT_OP_GCS 0x6 /* Guarded Control Stack */
> +#define PERF_MEM_EXT_OP_SHIFT 46
> +
> #define PERF_MEM_S(a, s) \
> (((__u64)PERF_MEM_##a##_##s) << PERF_MEM_##a##_SHIFT)
>
>
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-01-16 15:27 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-06 12:07 [PATCH RESEND v4 0/8] perf arm_spe: Extend operations Leo Yan
2026-01-06 12:07 ` [PATCH RESEND v4 1/8] perf/uapi: Extend data source fields Leo Yan
2026-01-06 12:07 ` [PATCH RESEND v4 2/8] tools/include: Sync uapi/linux/perf.h with the kernel sources Leo Yan
2026-01-16 15:27 ` Leo Yan
2026-01-06 12:07 ` [PATCH RESEND v4 3/8] perf mem: Print extended fields Leo Yan
2026-01-06 12:07 ` [PATCH RESEND v4 4/8] perf arm_spe: Set extended fields in data source Leo Yan
2026-01-06 12:07 ` [PATCH RESEND v4 5/8] perf sort: Support sort ASE and SME Leo Yan
2026-01-06 12:07 ` [PATCH RESEND v4 6/8] perf sort: Sort disabled and full predicated flags Leo Yan
2026-01-06 12:07 ` [PATCH RESEND v4 7/8] perf report: Update document for SIMD flags Leo Yan
2026-01-06 12:07 ` [PATCH RESEND v4 8/8] perf arm_spe: Improve SIMD flags setting Leo Yan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox