* [PATCH v2 00/16] perf arm64: Support data type profiling
@ 2026-04-03 9:47 Tengda Wu
2026-04-03 9:47 ` [PATCH v2 01/16] perf llvm: Fix arm64 adrp instruction disassembly mismatch with objdump Tengda Wu
` (16 more replies)
0 siblings, 17 replies; 22+ messages in thread
From: Tengda Wu @ 2026-04-03 9:47 UTC (permalink / raw)
To: Peter Zijlstra, Namhyung Kim, leo.yan, Li Huafei, Ian Rogers,
Kim Phillips, Mark Rutland, Arnaldo Carvalho de Melo, Ingo Molnar
Cc: Bill Wendling, Nick Desaulniers, Alexander Shishkin,
Adrian Hunter, Zecheng Li, linux-perf-users, linux-kernel, llvm,
Tengda Wu
This patch series implements data type profiling support for arm64,
building upon the foundational work previously contributed by Huafei [1].
While the initial version laid the groundwork for arm64 data type analysis,
this series iterates on that work by refining instruction parsing and
extending support for core architectural features.
The series is organized as follows:
1. Fix disassembly mismatches (Patches 01-02)
Current perf annotate supports three disassembly backends: llvm,
capstone, and objdump. On arm64, inconsistencies between the output
of these backends (specifically llvm/capstone vs. objdump) often
prevent the tracker from correctly identifying registers and offsets.
These patches resolve these mismatches, ensuring consistent instruction
parsing across all supported backends.
2. Infrastructure for arm64 operand parsing (Patches 03-07)
These patches establish the necessary infrastructure for arm64-specific
operand handling. This includes implementing new callbacks and data
structures to manage arm64's unique addressing modes and register sets.
This foundation is essential for the subsequent type-tracking logic.
3. Core instruction tracking (Patches 08-16)
These patches implement the core logic for type tracking on arm64,
covering a wide range of instructions including:
* Memory Access: ldr/str variants (including stack-based access).
* Arithmetic & Data Processing: mov, add, and adrp.
* Special Access: System register access (mrs) and per-cpu variable
tracking.
The implementation draws inspiration from the existing x86 logic while
adapting it to the nuances of the AArch64 ISA [2][3]. With these changes,
perf annotate can successfully resolve memory locations and register
types, enabling comprehensive data type profiling on arm64 platforms.
Example Result
==============
# perf mem record -a -K -- sleep 1
# perf annotate --data-type --type-stat --stdio
Annotate data type stats:
total 6204, ok 5091 (82.1%), bad 1113 (17.9%)
-----------------------------------------------------------
29 : no_sym
196 : no_var
806 : no_typeinfo
82 : bad_offset
1370 : insn_track
Annotate type: 'struct page' in [kernel.kallsyms] (59208 samples):
============================================================================
Percent offset size field
100.00 0 0x40 struct page {
9.95 0 0x8 long unsigned int flags;
52.83 0x8 0x28 union {
52.83 0x8 0x28 struct {
37.21 0x8 0x10 union {
37.21 0x8 0x10 struct list_head lru {
37.21 0x8 0x8 struct list_head* next;
0.00 0x10 0x8 struct list_head* prev;
};
37.21 0x8 0x10 struct {
37.21 0x8 0x8 void* __filler;
0.00 0x10 0x4 unsigned int mlock_count;
...
Changes since v1: (reworked from Huafei's series):
- Fix inconsistencies in arm64 instruction output across llvm, capstone,
and objdump disassembly backends.
- Support arm64-specific addressing modes and operand formats. (Leo Yan)
- Extend instruction tracking to support mov and add instructions,
along with per-cpu and stack variables.
- Include real-world examples in commit messages to demonstrate
practical effects. (Namhyung Kim)
- Improve type-tracking success rate (type stat) from 64.2% to 82.1%.
https://lore.kernel.org/all/20250314162137.528204-1-lihuafei1@huawei.com/
Please let me know if you have any feedback.
Thanks,
Tengda
[1] https://lore.kernel.org/all/20250314162137.528204-1-lihuafei1@huawei.com/
[2] https://developer.arm.com/documentation/102374/0103
[3] https://github.com/flynd/asmsheets/releases/tag/v8
---
Tengda Wu (16):
perf llvm: Fix arm64 adrp instruction disassembly mismatch with
objdump
perf capstone: Fix arm64 jump/adrp disassembly mismatch with objdump
perf annotate-arm64: Generalize arm64_mov__parse to support standard
operands
perf annotate-arm64: Handle load and store instructions
perf annotate: Introduce extract_op_location callback for
arch-specific parsing
perf dwarf-regs: Adapt get_dwarf_regnum() for arm64
perf annotate-arm64: Implement extract_op_location() callback
perf annotate-arm64: Enable instruction tracking support
perf annotate-arm64: Support load instruction tracking
perf annotate-arm64: Support store instruction tracking
perf annotate-arm64: Support stack variable tracking
perf annotate-arm64: Support 'mov' instruction tracking
perf annotate-arm64: Support 'add' instruction tracking
perf annotate-arm64: Support 'adrp' instruction to track global
variables
perf annotate-arm64: Support per-cpu variable access tracking
perf annotate-arm64: Support 'mrs' instruction to track 'current'
pointer
.../perf/util/annotate-arch/annotate-arm64.c | 642 +++++++++++++++++-
.../util/annotate-arch/annotate-powerpc.c | 10 +
tools/perf/util/annotate-arch/annotate-x86.c | 88 ++-
tools/perf/util/annotate-data.c | 72 +-
tools/perf/util/annotate-data.h | 7 +-
tools/perf/util/annotate.c | 108 +--
tools/perf/util/annotate.h | 12 +
tools/perf/util/capstone.c | 107 ++-
tools/perf/util/disasm.c | 5 +
tools/perf/util/disasm.h | 5 +
.../util/dwarf-regs-arch/dwarf-regs-arm64.c | 20 +
tools/perf/util/dwarf-regs.c | 2 +-
tools/perf/util/include/dwarf-regs.h | 1 +
tools/perf/util/llvm.c | 50 ++
14 files changed, 984 insertions(+), 145 deletions(-)
base-commit: cf7c3c02fdd0dfccf4d6611714273dcb538af2cb
--
2.34.1
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v2 01/16] perf llvm: Fix arm64 adrp instruction disassembly mismatch with objdump
2026-04-03 9:47 [PATCH v2 00/16] perf arm64: Support data type profiling Tengda Wu
@ 2026-04-03 9:47 ` Tengda Wu
2026-04-03 9:47 ` [PATCH v2 02/16] perf capstone: Fix arm64 jump/adrp " Tengda Wu
` (15 subsequent siblings)
16 siblings, 0 replies; 22+ messages in thread
From: Tengda Wu @ 2026-04-03 9:47 UTC (permalink / raw)
To: Peter Zijlstra, Namhyung Kim, leo.yan, Li Huafei, Ian Rogers,
Kim Phillips, Mark Rutland, Arnaldo Carvalho de Melo, Ingo Molnar
Cc: Bill Wendling, Nick Desaulniers, Alexander Shishkin,
Adrian Hunter, Zecheng Li, linux-perf-users, linux-kernel, llvm,
Tengda Wu
The operands of 'adrp' instructions parsed by libllvm are currently
represented as raw immediates rather than the "address <symbol+offset>"
format used by objdump. This inconsistency causes arm64_mov__parse()
to fail when parsing these instructions during post-processing.
Example of the mismatch:
Current: adrp x18, 8014
Fix: adrp x18, ffff800081f5f000 <this_cpu_vector>
Fix this by manually extracting the target address from the raw adrp
instruction via symbol_lookup_callback(). The address is then converted
to a specific symbol during symbol__disassemble_llvm() and formatted
to match objdump's output, ensuring compatibility with existing
parsers.
Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
---
tools/perf/util/llvm.c | 50 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 50 insertions(+)
diff --git a/tools/perf/util/llvm.c b/tools/perf/util/llvm.c
index a0deb742a733..533d47e8084d 100644
--- a/tools/perf/util/llvm.c
+++ b/tools/perf/util/llvm.c
@@ -94,6 +94,7 @@ static void init_llvm(void)
struct symbol_lookup_storage {
u64 branch_addr;
u64 pcrel_load_addr;
+ u64 pcrel_adrp_addr;
};
static const char *
@@ -108,6 +109,18 @@ symbol_lookup_callback(void *disinfo, uint64_t value,
storage->branch_addr = value;
else if (*ref_type == LLVMDisassembler_ReferenceType_In_PCrel_Load)
storage->pcrel_load_addr = value;
+ else if (*ref_type == LLVMDisassembler_ReferenceType_In_ARM64_ADRP) {
+ uint64_t adrp_imm;
+
+ /* immhi (bits 23:5) and immlo (bits 30:29) */
+ adrp_imm = ((value & 0x00ffffe0) >> 3) | ((value >> 29) & 0x3);
+ /* Sign-extend the 21-bit immediate to 64-bit */
+ if (adrp_imm & (1ULL << 20))
+ adrp_imm |= ~((1ULL << 21) - 1);
+
+ /* Calculate the target page address */
+ storage->pcrel_adrp_addr = (address & ~0xFFFLL) + (adrp_imm << 12);
+ }
*ref_type = LLVMDisassembler_ReferenceType_InOut_None;
return NULL;
}
@@ -204,6 +217,7 @@ int symbol__disassemble_llvm(const char *filename, struct symbol *sym,
storage.branch_addr = 0;
storage.pcrel_load_addr = 0;
+ storage.pcrel_adrp_addr = 0;
/*
* LLVM's API has the code be disassembled as non-const, cast
@@ -227,6 +241,42 @@ int symbol__disassemble_llvm(const char *filename, struct symbol *sym,
free(name);
}
}
+ if (storage.pcrel_adrp_addr != 0) {
+ /*
+ * ADRP (Address Page) instructions encode a 21-bit signed
+ * immediate offset relative to the current PC's page.
+ *
+ * To maintain consistency with standard objdump output,
+ * we truncate the raw encoded immediate at the comma
+ * and replace it with the resolved absolute page address.
+ *
+ * Example conversion:
+ * From: adrp x18, 8014
+ * To: adrp x18, ffff800081f5f000 <this_cpu_vector>
+ */
+ char *name;
+ char *s = strchr(disasm_buf, ',');
+
+ if (s == NULL)
+ goto err;
+
+ s++;
+ *s = '\0';
+ disasm_len = strlen(disasm_buf);
+ disasm_len += scnprintf(disasm_buf + disasm_len,
+ sizeof(disasm_buf) - disasm_len,
+ " %"PRIx64,
+ storage.pcrel_adrp_addr);
+ name = llvm_name_for_data(dso, filename,
+ storage.pcrel_adrp_addr);
+ if (name) {
+ disasm_len += scnprintf(disasm_buf + disasm_len,
+ sizeof(disasm_buf) -
+ disasm_len,
+ " <%s>", name);
+ free(name);
+ }
+ }
if (storage.pcrel_load_addr != 0) {
char *name = llvm_name_for_data(dso, filename,
storage.pcrel_load_addr);
--
2.34.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v2 02/16] perf capstone: Fix arm64 jump/adrp disassembly mismatch with objdump
2026-04-03 9:47 [PATCH v2 00/16] perf arm64: Support data type profiling Tengda Wu
2026-04-03 9:47 ` [PATCH v2 01/16] perf llvm: Fix arm64 adrp instruction disassembly mismatch with objdump Tengda Wu
@ 2026-04-03 9:47 ` Tengda Wu
2026-04-07 6:43 ` Namhyung Kim
2026-04-03 9:47 ` [PATCH v2 03/16] perf annotate-arm64: Generalize arm64_mov__parse to support standard operands Tengda Wu
` (14 subsequent siblings)
16 siblings, 1 reply; 22+ messages in thread
From: Tengda Wu @ 2026-04-03 9:47 UTC (permalink / raw)
To: Peter Zijlstra, Namhyung Kim, leo.yan, Li Huafei, Ian Rogers,
Kim Phillips, Mark Rutland, Arnaldo Carvalho de Melo, Ingo Molnar
Cc: Bill Wendling, Nick Desaulniers, Alexander Shishkin,
Adrian Hunter, Zecheng Li, linux-perf-users, linux-kernel, llvm,
Tengda Wu
The jump and adrp instructions parsed by libcapstone currently lack
symbolic representation and use a '#' prefix for addresses. This
format is inconsistent with objdump's output, which causes subsequent
parsing in jump__parse() and arm64_mov__parse() to fail.
Example mismatch:
Current: b #0xffff8000800114c8
Fix: b ffff8000800114c8 <el0t_64_sync+0x108>
Current: adrp x18, #0xffff800081f5f000
Fix: adrp x18, ffff800081f5f000 <this_cpu_vector>
Fix this by implementing extended formatting for these arm64
instructions during symbol__disassemble_capstone(). This ensures
the output matches objdump's expected style, including the raw
address and the associated <symbol+offset> suffix.
Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
---
tools/perf/util/capstone.c | 107 ++++++++++++++++++++++++++++++++-----
tools/perf/util/disasm.c | 5 ++
tools/perf/util/disasm.h | 1 +
3 files changed, 101 insertions(+), 12 deletions(-)
diff --git a/tools/perf/util/capstone.c b/tools/perf/util/capstone.c
index 25cf6e15ec27..1d8421d2d98c 100644
--- a/tools/perf/util/capstone.c
+++ b/tools/perf/util/capstone.c
@@ -255,10 +255,6 @@ static void print_capstone_detail(struct cs_insn *insn, char *buf, size_t len,
struct map *map = args->ms->map;
struct symbol *sym;
- /* TODO: support more architectures */
- if (!arch__is_x86(args->arch))
- return;
-
if (insn->detail == NULL)
return;
@@ -305,6 +301,98 @@ static void print_capstone_detail(struct cs_insn *insn, char *buf, size_t len,
}
}
+static void format_capstone_insn_x86(struct cs_insn *insn, char *buf,
+ size_t len, struct annotate_args *args,
+ u64 addr)
+{
+ int printed;
+
+ printed = scnprintf(buf, len, " %-7s %s",
+ insn->mnemonic, insn->op_str);
+ buf += printed;
+ len -= printed;
+
+ print_capstone_detail(insn, buf, len, args, addr);
+}
+
+static void format_capstone_insn_arm64(struct cs_insn *insn, char *buf,
+ size_t len, struct annotate_args *args,
+ u64 addr)
+{
+ struct map *map = args->ms->map;
+ struct symbol *sym;
+ char *last_imm, *endptr;
+ u64 orig_addr;
+
+ scnprintf(buf, len, " %-7s %s",
+ insn->mnemonic, insn->op_str);
+ /*
+ * Adjust instructions to keep the existing behavior with objdump.
+ *
+ * Example conversion:
+ * From: b #0xffff8000800114c8
+ * To: b ffff8000800114c8 <el0t_64_sync+0x108>
+ */
+ switch (insn->id) {
+ case ARM64_INS_B:
+ case ARM64_INS_BL:
+ case ARM64_INS_CBNZ:
+ case ARM64_INS_CBZ:
+ case ARM64_INS_TBNZ:
+ case ARM64_INS_TBZ:
+ case ARM64_INS_ADRP:
+ /* Extract last immediate value as address */
+ last_imm = strrchr(buf, '#');
+ if (!last_imm)
+ return;
+
+ orig_addr = strtoull(last_imm + 1, &endptr, 16);
+ if (endptr == last_imm + 1)
+ return;
+
+ /* Relocate map that contains the address */
+ if (dso__kernel(map__dso(map))) {
+ map = maps__find(map__kmaps(map), orig_addr);
+ if (map == NULL)
+ return;
+ }
+
+ /* Convert it to map-relative address for search */
+ addr = map__map_ip(map, orig_addr);
+
+ sym = map__find_symbol(map, addr);
+ if (sym == NULL)
+ return;
+
+ /* Symbolize the resolved address */
+ len = len - (last_imm - buf);
+ if (addr == sym->start) {
+ scnprintf(last_imm, len, "%"PRIx64" <%s>",
+ orig_addr, sym->name);
+ } else {
+ scnprintf(last_imm, len, "%"PRIx64" <%s+%#"PRIx64">",
+ orig_addr, sym->name, addr - sym->start);
+ }
+ break;
+ default:
+ break;
+ }
+}
+
+static void format_capstone_insn(struct cs_insn *insn, char *buf, size_t len,
+ struct annotate_args *args, u64 addr)
+{
+ /* TODO: support more architectures */
+ if (arch__is_x86(args->arch))
+ format_capstone_insn_x86(insn, buf, len, args, addr);
+ else if (arch__is_arm64(args->arch))
+ format_capstone_insn_arm64(insn, buf, len, args, addr);
+ else {
+ scnprintf(buf, len, " %-7s %s",
+ insn->mnemonic, insn->op_str);
+ }
+}
+
struct find_file_offset_data {
u64 ip;
u64 offset;
@@ -381,14 +469,9 @@ int symbol__disassemble_capstone(const char *filename __maybe_unused,
free_count = count = perf_cs_disasm(handle, buf, buf_len, start, buf_len, &insn);
for (i = 0, offset = 0; i < count; i++) {
- int printed;
-
- printed = scnprintf(disasm_buf, sizeof(disasm_buf),
- " %-7s %s",
- insn[i].mnemonic, insn[i].op_str);
- print_capstone_detail(&insn[i], disasm_buf + printed,
- sizeof(disasm_buf) - printed, args,
- start + offset);
+ format_capstone_insn(&insn[i], disasm_buf,
+ sizeof(disasm_buf), args,
+ start + offset);
args->offset = offset;
args->line = disasm_buf;
diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
index 40fcaed5d0b1..988b2b748e11 100644
--- a/tools/perf/util/disasm.c
+++ b/tools/perf/util/disasm.c
@@ -202,6 +202,11 @@ bool arch__is_powerpc(const struct arch *arch)
return arch->id.e_machine == EM_PPC || arch->id.e_machine == EM_PPC64;
}
+bool arch__is_arm64(const struct arch *arch)
+{
+ return arch->id.e_machine == EM_AARCH64;
+}
+
static void ins_ops__delete(struct ins_operands *ops)
{
if (ops == NULL)
diff --git a/tools/perf/util/disasm.h b/tools/perf/util/disasm.h
index a6e478caf61a..d3730ed86dba 100644
--- a/tools/perf/util/disasm.h
+++ b/tools/perf/util/disasm.h
@@ -111,6 +111,7 @@ struct annotate_args {
const struct arch *arch__find(uint16_t e_machine, uint32_t e_flags, const char *cpuid);
bool arch__is_x86(const struct arch *arch);
bool arch__is_powerpc(const struct arch *arch);
+bool arch__is_arm64(const struct arch *arch);
extern const struct ins_ops call_ops;
extern const struct ins_ops dec_ops;
--
2.34.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v2 03/16] perf annotate-arm64: Generalize arm64_mov__parse to support standard operands
2026-04-03 9:47 [PATCH v2 00/16] perf arm64: Support data type profiling Tengda Wu
2026-04-03 9:47 ` [PATCH v2 01/16] perf llvm: Fix arm64 adrp instruction disassembly mismatch with objdump Tengda Wu
2026-04-03 9:47 ` [PATCH v2 02/16] perf capstone: Fix arm64 jump/adrp " Tengda Wu
@ 2026-04-03 9:47 ` Tengda Wu
2026-04-07 6:58 ` Namhyung Kim
2026-04-03 9:47 ` [PATCH v2 04/16] perf annotate-arm64: Handle load and store instructions Tengda Wu
` (13 subsequent siblings)
16 siblings, 1 reply; 22+ messages in thread
From: Tengda Wu @ 2026-04-03 9:47 UTC (permalink / raw)
To: Peter Zijlstra, Namhyung Kim, leo.yan, Li Huafei, Ian Rogers,
Kim Phillips, Mark Rutland, Arnaldo Carvalho de Melo, Ingo Molnar
Cc: Bill Wendling, Nick Desaulniers, Alexander Shishkin,
Adrian Hunter, Zecheng Li, linux-perf-users, linux-kernel, llvm,
Tengda Wu
The current arm64_mov__parse() implementation strictly requires the
operand to contain a symbol suffix in the "<symbol>" format. This
causes the parser to fail for standard instructions that only contain
raw immediates or registers without symbolic annotations.
Refactor the function to make symbol matching optional. The parser now
correctly extracts the target operand and only attempts to parse the
"<symbol>" suffix if it exists. This change also introduces better
handling for whitespace and comments, and adds support for multi-register
check via arm64__check_multi_regs(), ensuring compatibility with a
wider range of arm64 instruction formats.
Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
---
.../perf/util/annotate-arch/annotate-arm64.c | 85 ++++++++++++++-----
1 file changed, 65 insertions(+), 20 deletions(-)
diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
index 33080fdca125..4c42323b0c18 100644
--- a/tools/perf/util/annotate-arch/annotate-arm64.c
+++ b/tools/perf/util/annotate-arch/annotate-arm64.c
@@ -14,12 +14,38 @@ struct arch_arm64 {
regex_t jump_insn;
};
+static bool arm64__check_multi_regs(const char *op)
+{
+ char *comma = strchr(op, ',');
+
+ while (comma) {
+ char *next = comma + 1;
+
+ next = skip_spaces(next);
+
+ /*
+ * Check the first valid character after the comma:
+ * - If it is '#', it indicates an immediate offset (e.g., [x1, #16]).
+ * - If it is an alphabetic character, it is highly likely a
+ * register name (e.g., x, w, s, d, q, v, p, z).
+ * - Special cases: Alias and control registers like sp, xzr,
+ * and wzr all start with an alphabetic character.
+ */
+ if (*next && *next != '#' && isalpha(*next))
+ return true;
+
+ comma = strchr(next, ',');
+ }
+
+ return false;
+}
+
static int arm64_mov__parse(const struct arch *arch __maybe_unused,
struct ins_operands *ops,
struct map_symbol *ms __maybe_unused,
struct disasm_line *dl __maybe_unused)
{
- char *s = strchr(ops->raw, ','), *target, *endptr;
+ char *s = strchr(ops->raw, ','), *target, *endptr, *comment, prev;
if (s == NULL)
return -1;
@@ -31,29 +57,48 @@ static int arm64_mov__parse(const struct arch *arch __maybe_unused,
if (ops->source.raw == NULL)
return -1;
- target = ++s;
+ target = skip_spaces(++s);
+ comment = strchr(s, arch->objdump.comment_char);
+
+ if (comment != NULL)
+ s = comment - 1;
+ else
+ s = strchr(s, '\0') - 1;
+
+ while (s > target && isspace(s[0]))
+ --s;
+ s++;
+ prev = *s;
+ *s = '\0';
ops->target.raw = strdup(target);
+ *s = prev;
+
if (ops->target.raw == NULL)
goto out_free_source;
- ops->target.addr = strtoull(target, &endptr, 16);
- if (endptr == target)
- goto out_free_target;
-
- s = strchr(endptr, '<');
- if (s == NULL)
- goto out_free_target;
- endptr = strchr(s + 1, '>');
- if (endptr == NULL)
- goto out_free_target;
-
- *endptr = '\0';
- *s = ' ';
- ops->target.name = strdup(s);
- *s = '<';
- *endptr = '>';
- if (ops->target.name == NULL)
- goto out_free_target;
+ ops->target.multi_regs = arm64__check_multi_regs(ops->target.raw);
+
+ /* Parse address followed by symbol name, e.g. "addr <symbol>" */
+ if (strchr(target, '<') != NULL) {
+ ops->target.addr = strtoull(target, &endptr, 16);
+ if (endptr == target)
+ goto out_free_target;
+
+ s = strchr(endptr, '<');
+ if (s == NULL)
+ goto out_free_target;
+ endptr = strchr(s + 1, '>');
+ if (endptr == NULL)
+ goto out_free_target;
+
+ *endptr = '\0';
+ *s = ' ';
+ ops->target.name = strdup(s);
+ *s = '<';
+ *endptr = '>';
+ if (ops->target.name == NULL)
+ goto out_free_target;
+ }
return 0;
--
2.34.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v2 04/16] perf annotate-arm64: Handle load and store instructions
2026-04-03 9:47 [PATCH v2 00/16] perf arm64: Support data type profiling Tengda Wu
` (2 preceding siblings ...)
2026-04-03 9:47 ` [PATCH v2 03/16] perf annotate-arm64: Generalize arm64_mov__parse to support standard operands Tengda Wu
@ 2026-04-03 9:47 ` Tengda Wu
2026-04-07 7:09 ` Namhyung Kim
2026-04-03 9:47 ` [PATCH v2 05/16] perf annotate: Introduce extract_op_location callback for arch-specific parsing Tengda Wu
` (12 subsequent siblings)
16 siblings, 1 reply; 22+ messages in thread
From: Tengda Wu @ 2026-04-03 9:47 UTC (permalink / raw)
To: Peter Zijlstra, Namhyung Kim, leo.yan, Li Huafei, Ian Rogers,
Kim Phillips, Mark Rutland, Arnaldo Carvalho de Melo, Ingo Molnar
Cc: Bill Wendling, Nick Desaulniers, Alexander Shishkin,
Adrian Hunter, Zecheng Li, linux-perf-users, linux-kernel, llvm,
Tengda Wu
Add ldst_ops to handle load and store instructions in order to parse
the data types and offsets associated with PMU events for memory access
instructions. There are many variants of load and store instructions in
ARM64, making it difficult to match all of these instruction names
completely. Therefore, only the instruction prefixes are matched. The
prefix 'ld|st' covers most of the memory access instructions, 'cas|swp'
matches atomic instructions, and 'prf' matches memory prefetch
instructions.
Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
---
.../perf/util/annotate-arch/annotate-arm64.c | 72 +++++++++++++++++++
1 file changed, 72 insertions(+)
diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
index 4c42323b0c18..8209faaa6086 100644
--- a/tools/perf/util/annotate-arch/annotate-arm64.c
+++ b/tools/perf/util/annotate-arch/annotate-arm64.c
@@ -3,7 +3,9 @@
#include <errno.h>
#include <stdlib.h>
#include <string.h>
+#include <ctype.h>
#include <linux/zalloc.h>
+#include <linux/string.h>
#include <regex.h>
#include "../annotate.h"
#include "../disasm.h"
@@ -12,6 +14,7 @@ struct arch_arm64 {
struct arch arch;
regex_t call_insn;
regex_t jump_insn;
+ regex_t ldst_insn; /* load and store instruction */
};
static bool arm64__check_multi_regs(const char *op)
@@ -114,6 +117,59 @@ static const struct ins_ops arm64_mov_ops = {
.scnprintf = mov__scnprintf,
};
+static int arm64_ldst__parse(const struct arch *arch __maybe_unused,
+ struct ins_operands *ops,
+ struct map_symbol *ms __maybe_unused,
+ struct disasm_line *dl __maybe_unused)
+{
+ char *s, *target;
+
+ /*
+ * The part starting from the memory access annotation '[' is parsed
+ * as 'target', while the part before it is parsed as 'source'.
+ */
+ target = s = strchr(ops->raw, arch->objdump.memory_ref_char);
+ if (!s)
+ return -1;
+
+ while (s > ops->raw && *s != ',')
+ --s;
+
+ if (s == ops->raw)
+ return -1;
+
+ *s = '\0';
+ ops->source.raw = strdup(ops->raw);
+
+ *s = ',';
+ if (!ops->source.raw)
+ return -1;
+
+ ops->source.multi_regs = arm64__check_multi_regs(ops->source.raw);
+
+ ops->target.raw = strdup(target);
+ if (!ops->target.raw) {
+ zfree(&ops->source.raw);
+ return -1;
+ }
+ ops->target.mem_ref = true;
+ ops->target.multi_regs = arm64__check_multi_regs(ops->target.raw);
+
+ return 0;
+}
+
+static int arm64_ldst__scnprintf(const struct ins *ins, char *bf, size_t size,
+ struct ins_operands *ops, int max_ins_name)
+{
+ return scnprintf(bf, size, "%-*s %s,%s", max_ins_name, ins->name,
+ ops->source.raw, ops->target.raw);
+}
+
+static struct ins_ops arm64_ldst_ops = {
+ .parse = arm64_ldst__parse,
+ .scnprintf = arm64_ldst__scnprintf,
+};
+
static const struct ins_ops *arm64__associate_instruction_ops(struct arch *arch, const char *name)
{
struct arch_arm64 *arm = container_of(arch, struct arch_arm64, arch);
@@ -124,6 +180,8 @@ static const struct ins_ops *arm64__associate_instruction_ops(struct arch *arch,
ops = &jump_ops;
else if (!regexec(&arm->call_insn, name, 2, match, 0))
ops = &call_ops;
+ else if (!regexec(&arm->ldst_insn, name, 2, match, 0))
+ ops = &arm64_ldst_ops;
else if (!strcmp(name, "ret"))
ops = &ret_ops;
else
@@ -148,6 +206,8 @@ const struct arch *arch__new_arm64(const struct e_machine_and_e_flags *id,
arch->id = *id;
arch->objdump.comment_char = '/';
arch->objdump.skip_functions_char = '+';
+ arch->objdump.memory_ref_char = '[';
+ arch->objdump.imm_char = '#';
arch->associate_instruction_ops = arm64__associate_instruction_ops;
/* bl, blr */
@@ -161,8 +221,20 @@ const struct arch *arch__new_arm64(const struct e_machine_and_e_flags *id,
if (err)
goto out_free_call;
+ /*
+ * The ARM64 architecture has many variants of load/store instructions.
+ * It is quite challenging to match all of them completely. Here, we
+ * only match the prefixes of these instructions.
+ */
+ err = regcomp(&arm->ldst_insn, "^(ld|st|cas|prf|swp)",
+ REG_EXTENDED);
+ if (err)
+ goto out_free_jump;
+
return arch;
+out_free_jump:
+ regfree(&arm->jump_insn);
out_free_call:
regfree(&arm->call_insn);
out_free_arm:
--
2.34.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v2 05/16] perf annotate: Introduce extract_op_location callback for arch-specific parsing
2026-04-03 9:47 [PATCH v2 00/16] perf arm64: Support data type profiling Tengda Wu
` (3 preceding siblings ...)
2026-04-03 9:47 ` [PATCH v2 04/16] perf annotate-arm64: Handle load and store instructions Tengda Wu
@ 2026-04-03 9:47 ` Tengda Wu
2026-04-03 9:47 ` [PATCH v2 06/16] perf dwarf-regs: Adapt get_dwarf_regnum() for arm64 Tengda Wu
` (11 subsequent siblings)
16 siblings, 0 replies; 22+ messages in thread
From: Tengda Wu @ 2026-04-03 9:47 UTC (permalink / raw)
To: Peter Zijlstra, Namhyung Kim, leo.yan, Li Huafei, Ian Rogers,
Kim Phillips, Mark Rutland, Arnaldo Carvalho de Melo, Ingo Molnar
Cc: Bill Wendling, Nick Desaulniers, Alexander Shishkin,
Adrian Hunter, Zecheng Li, linux-perf-users, linux-kernel, llvm,
Tengda Wu
Assembly syntax for operands varies significantly across different
architectures, which prevents the operand location (op_loc) parsing
logic in annotate_get_insn_location() from being directly reused.
To simplify the core logic and improve maintainability, move the
operand parsing inside the for_each_insn_op_loc loop into arch-specific
extract_op_location callbacks. This refactoring is intended to be a
cleanup with no functional changes.
Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
---
.../util/annotate-arch/annotate-powerpc.c | 10 ++
tools/perf/util/annotate-arch/annotate-x86.c | 82 ++++++++++++++++
tools/perf/util/annotate.c | 96 ++-----------------
tools/perf/util/annotate.h | 2 +
tools/perf/util/disasm.h | 4 +
5 files changed, 105 insertions(+), 89 deletions(-)
diff --git a/tools/perf/util/annotate-arch/annotate-powerpc.c b/tools/perf/util/annotate-arch/annotate-powerpc.c
index 218207b52581..8d0b8def5955 100644
--- a/tools/perf/util/annotate-arch/annotate-powerpc.c
+++ b/tools/perf/util/annotate-arch/annotate-powerpc.c
@@ -390,6 +390,15 @@ static void update_insn_state_powerpc(struct type_state *state,
}
#endif /* HAVE_LIBDW_SUPPORT */
+static int extract_op_location_powerpc(const struct arch *arch __maybe_unused,
+ struct disasm_line *dl,
+ const char *op_str __maybe_unused, int op_idx,
+ struct annotated_op_loc *op_loc)
+{
+ get_powerpc_regs(dl->raw.raw_insn, !op_idx, op_loc);
+ return 0;
+}
+
const struct arch *arch__new_powerpc(const struct e_machine_and_e_flags *id,
const char *cpuid __maybe_unused)
{
@@ -406,5 +415,6 @@ const struct arch *arch__new_powerpc(const struct e_machine_and_e_flags *id,
#ifdef HAVE_LIBDW_SUPPORT
arch->update_insn_state = update_insn_state_powerpc;
#endif
+ arch->extract_op_location = extract_op_location_powerpc;
return arch;
}
diff --git a/tools/perf/util/annotate-arch/annotate-x86.c b/tools/perf/util/annotate-arch/annotate-x86.c
index c77aabd48eba..c63ca3250b95 100644
--- a/tools/perf/util/annotate-arch/annotate-x86.c
+++ b/tools/perf/util/annotate-arch/annotate-x86.c
@@ -3,6 +3,7 @@
#include <linux/compiler.h>
#include <assert.h>
#include <inttypes.h>
+#include <ctype.h>
#include "../annotate-data.h"
#include "../debug.h"
#include "../disasm.h"
@@ -808,6 +809,86 @@ static void update_insn_state_x86(struct type_state *state,
}
#endif
+/*
+ * Get register number and access offset from the given instruction.
+ * It assumes AT&T x86 asm format like OFFSET(REG). Maybe it needs
+ * to revisit the format when it handles different architecture.
+ * Fills @reg and @offset when return 0.
+ */
+static int extract_reg_offset(const struct arch *arch, const char *str,
+ struct annotated_op_loc *op_loc)
+{
+ char *p;
+
+ if (arch->objdump.register_char == 0)
+ return -1;
+
+ /*
+ * It should start from offset, but it's possible to skip 0
+ * in the asm. So 0(%rax) should be same as (%rax).
+ *
+ * However, it also start with a segment select register like
+ * %gs:0x18(%rbx). In that case it should skip the part.
+ */
+ if (*str == arch->objdump.register_char) {
+ /* FIXME: Handle other segment registers */
+ if (!strncmp(str, "%gs:", 4))
+ op_loc->segment = INSN_SEG_X86_GS;
+
+ while (*str && !isdigit(*str) &&
+ *str != arch->objdump.memory_ref_char)
+ str++;
+ }
+
+ op_loc->offset = strtol(str, &p, 0);
+ op_loc->reg1 = arch__dwarf_regnum(arch, p);
+ if (op_loc->reg1 == -1)
+ return -1;
+
+ /* Get the second register */
+ if (op_loc->multi_regs)
+ op_loc->reg2 = arch__dwarf_regnum(arch, p + 1);
+
+ return 0;
+}
+
+static int extract_op_location_x86(const struct arch *arch,
+ struct disasm_line *dl __maybe_unused,
+ const char *op_str, int op_idx __maybe_unused,
+ struct annotated_op_loc *op_loc)
+{
+ const char *s = op_str;
+ char *p = NULL;
+
+ if (op_str == NULL)
+ return 0;
+
+ if (strchr(op_str, arch->objdump.memory_ref_char)) {
+ op_loc->mem_ref = true;
+ return extract_reg_offset(arch, op_str, op_loc);
+ }
+
+ /* FIXME: Handle other segment registers */
+ if (!strncmp(op_str, "%gs:", 4)) {
+ op_loc->segment = INSN_SEG_X86_GS;
+ op_loc->offset = strtol(op_str + 4,
+ &p, 0);
+ if (p && p != op_str + 4)
+ op_loc->imm = true;
+ return 0;
+ }
+
+ if (*s == arch->objdump.register_char) {
+ op_loc->reg1 = arch__dwarf_regnum(arch, s);
+ } else if (*s == arch->objdump.imm_char) {
+ op_loc->offset = strtol(s + 1, &p, 0);
+ if (p && p != s + 1)
+ op_loc->imm = true;
+ }
+
+ return 0;
+}
+
const struct arch *arch__new_x86(const struct e_machine_and_e_flags *id, const char *cpuid)
{
struct arch *arch = zalloc(sizeof(*arch));
@@ -847,5 +928,6 @@ const struct arch *arch__new_x86(const struct e_machine_and_e_flags *id, const c
#ifdef HAVE_LIBDW_SUPPORT
arch->update_insn_state = update_insn_state_x86;
#endif
+ arch->extract_op_location = extract_op_location_x86;
return arch;
}
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 63f0ee9d4c03..1bf69e00d76d 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -2450,7 +2450,7 @@ int annotate_check_args(void)
return 0;
}
-static int arch__dwarf_regnum(const struct arch *arch, const char *str)
+int arch__dwarf_regnum(const struct arch *arch, const char *str)
{
const char *p;
char *regname, *q;
@@ -2473,51 +2473,6 @@ static int arch__dwarf_regnum(const struct arch *arch, const char *str)
return reg;
}
-/*
- * Get register number and access offset from the given instruction.
- * It assumes AT&T x86 asm format like OFFSET(REG). Maybe it needs
- * to revisit the format when it handles different architecture.
- * Fills @reg and @offset when return 0.
- */
-static int extract_reg_offset(const struct arch *arch, const char *str,
- struct annotated_op_loc *op_loc)
-{
- char *p;
-
- if (arch->objdump.register_char == 0)
- return -1;
-
- /*
- * It should start from offset, but it's possible to skip 0
- * in the asm. So 0(%rax) should be same as (%rax).
- *
- * However, it also start with a segment select register like
- * %gs:0x18(%rbx). In that case it should skip the part.
- */
- if (*str == arch->objdump.register_char) {
- if (arch__is_x86(arch)) {
- /* FIXME: Handle other segment registers */
- if (!strncmp(str, "%gs:", 4))
- op_loc->segment = INSN_SEG_X86_GS;
- }
-
- while (*str && !isdigit(*str) &&
- *str != arch->objdump.memory_ref_char)
- str++;
- }
-
- op_loc->offset = strtol(str, &p, 0);
- op_loc->reg1 = arch__dwarf_regnum(arch, p);
- if (op_loc->reg1 == -1)
- return -1;
-
- /* Get the second register */
- if (op_loc->multi_regs)
- op_loc->reg2 = arch__dwarf_regnum(arch, p + 1);
-
- return 0;
-}
-
/**
* annotate_get_insn_location - Get location of instruction
* @arch: the architecture info
@@ -2548,6 +2503,7 @@ int annotate_get_insn_location(const struct arch *arch, struct disasm_line *dl,
struct ins_operands *ops;
struct annotated_op_loc *op_loc;
int i;
+ int ret;
if (ins__is_lock(&dl->ins))
ops = dl->ops.locked.ops;
@@ -2573,50 +2529,12 @@ int annotate_get_insn_location(const struct arch *arch, struct disasm_line *dl,
/* Invalidate the register by default */
op_loc->reg1 = -1;
op_loc->reg2 = -1;
+ op_loc->mem_ref = mem_ref;
+ op_loc->multi_regs = multi_regs;
- if (insn_str == NULL) {
- if (!arch__is_powerpc(arch))
- continue;
- }
-
- /*
- * For powerpc, call get_powerpc_regs function which extracts the
- * required fields for op_loc, ie reg1, reg2, offset from the
- * raw instruction.
- */
- if (arch__is_powerpc(arch)) {
- op_loc->mem_ref = mem_ref;
- op_loc->multi_regs = multi_regs;
- get_powerpc_regs(dl->raw.raw_insn, !i, op_loc);
- } else if (strchr(insn_str, arch->objdump.memory_ref_char)) {
- op_loc->mem_ref = true;
- op_loc->multi_regs = multi_regs;
- extract_reg_offset(arch, insn_str, op_loc);
- } else {
- const char *s = insn_str;
- char *p = NULL;
-
- if (arch__is_x86(arch)) {
- /* FIXME: Handle other segment registers */
- if (!strncmp(insn_str, "%gs:", 4)) {
- op_loc->segment = INSN_SEG_X86_GS;
- op_loc->offset = strtol(insn_str + 4,
- &p, 0);
- if (p && p != insn_str + 4)
- op_loc->imm = true;
- continue;
- }
- }
-
- if (*s == arch->objdump.register_char) {
- op_loc->reg1 = arch__dwarf_regnum(arch, s);
- }
- else if (*s == arch->objdump.imm_char) {
- op_loc->offset = strtol(s + 1, &p, 0);
- if (p && p != s + 1)
- op_loc->imm = true;
- }
- }
+ ret = arch->extract_op_location(arch, dl, insn_str, i, op_loc);
+ if (ret)
+ return ret;
}
return 0;
diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
index 696e36dbf013..71195a27d38f 100644
--- a/tools/perf/util/annotate.h
+++ b/tools/perf/util/annotate.h
@@ -494,6 +494,8 @@ int annotate_parse_percent_type(const struct option *opt, const char *_str,
int annotate_check_args(void);
+int arch__dwarf_regnum(const struct arch *arch, const char *str);
+
/**
* struct annotated_op_loc - Location info of instruction operand
* @reg1: First register in the operand
diff --git a/tools/perf/util/disasm.h b/tools/perf/util/disasm.h
index d3730ed86dba..94ee67bcbce7 100644
--- a/tools/perf/util/disasm.h
+++ b/tools/perf/util/disasm.h
@@ -16,6 +16,7 @@ struct symbol;
struct data_loc_info;
struct type_state;
struct disasm_line;
+struct annotated_op_loc;
struct e_machine_and_e_flags {
uint32_t e_flags;
@@ -49,6 +50,9 @@ struct arch {
struct data_loc_info *dloc, Dwarf_Die *cu_die,
struct disasm_line *dl);
#endif
+ int (*extract_op_location)(const struct arch *arch, struct disasm_line *dl,
+ const char *op_str, int op_idx,
+ struct annotated_op_loc *op_loc);
};
struct ins {
--
2.34.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v2 06/16] perf dwarf-regs: Adapt get_dwarf_regnum() for arm64
2026-04-03 9:47 [PATCH v2 00/16] perf arm64: Support data type profiling Tengda Wu
` (4 preceding siblings ...)
2026-04-03 9:47 ` [PATCH v2 05/16] perf annotate: Introduce extract_op_location callback for arch-specific parsing Tengda Wu
@ 2026-04-03 9:47 ` Tengda Wu
2026-04-03 9:47 ` [PATCH v2 07/16] perf annotate-arm64: Implement extract_op_location() callback Tengda Wu
` (10 subsequent siblings)
16 siblings, 0 replies; 22+ messages in thread
From: Tengda Wu @ 2026-04-03 9:47 UTC (permalink / raw)
To: Peter Zijlstra, Namhyung Kim, leo.yan, Li Huafei, Ian Rogers,
Kim Phillips, Mark Rutland, Arnaldo Carvalho de Melo, Ingo Molnar
Cc: Bill Wendling, Nick Desaulniers, Alexander Shishkin,
Adrian Hunter, Zecheng Li, linux-perf-users, linux-kernel, llvm,
Tengda Wu
The current arm64 DWARF register lookup relies on 'aarch64_regstr_tbl',
a static string table. While this works for kprobe-tracer where register
names start with '%', it is insufficient for parsing register numbers
directly from raw instructions (e.g., extracting '6' from 'x6' or 'w6')
during annotation.
Since get_dwarf_regnum() is currently used only by 'perf annotate' and
does not affect kprobe-tracer, replace the limited table-based lookup
with a programmatic implementation in __get_dwarf_regnum_arm64(). This
allows resolving arm64 register names (x0-x30, w0-w30, sp, etc.) directly
into their corresponding DWARF register numbers.
Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
---
.../util/dwarf-regs-arch/dwarf-regs-arm64.c | 20 +++++++++++++++++++
tools/perf/util/dwarf-regs.c | 2 +-
tools/perf/util/include/dwarf-regs.h | 1 +
3 files changed, 22 insertions(+), 1 deletion(-)
diff --git a/tools/perf/util/dwarf-regs-arch/dwarf-regs-arm64.c b/tools/perf/util/dwarf-regs-arch/dwarf-regs-arm64.c
index 593ca7d4fccc..be55fc2a4f38 100644
--- a/tools/perf/util/dwarf-regs-arch/dwarf-regs-arm64.c
+++ b/tools/perf/util/dwarf-regs-arch/dwarf-regs-arm64.c
@@ -1,5 +1,6 @@
// SPDX-License-Identifier: GPL-2.0
#include <errno.h>
+#include <ctype.h>
#include <dwarf-regs.h>
#include "../../../arch/arm64/include/uapi/asm/perf_regs.h"
@@ -10,3 +11,22 @@ int __get_dwarf_regnum_for_perf_regnum_arm64(int perf_regnum)
return perf_regnum;
}
+
+int __get_dwarf_regnum_arm64(const char *name)
+{
+ int reg;
+
+ if (!strcmp(name, "sp") || !strcmp(name, "wzr") || !strcmp(name, "xzr"))
+ return 31;
+
+ if (*name != 'x' && *name != 'w')
+ return -ENOENT;
+
+ name++;
+ if (!isdigit(*name))
+ return -ENOENT;
+
+ reg = strtol(name, NULL, 10);
+
+ return reg >= 0 && reg <= 30 ? reg : -ENOENT;
+}
diff --git a/tools/perf/util/dwarf-regs.c b/tools/perf/util/dwarf-regs.c
index 797f455eba0d..bacf5c13c3bc 100644
--- a/tools/perf/util/dwarf-regs.c
+++ b/tools/perf/util/dwarf-regs.c
@@ -114,7 +114,7 @@ int get_dwarf_regnum(const char *name, unsigned int machine, unsigned int flags)
reg = _get_dwarf_regnum(arm_regstr_tbl, name);
break;
case EM_AARCH64:
- reg = _get_dwarf_regnum(aarch64_regstr_tbl, name);
+ reg = __get_dwarf_regnum_arm64(name);
break;
case EM_CSKY:
reg = __get_csky_regnum(name, flags);
diff --git a/tools/perf/util/include/dwarf-regs.h b/tools/perf/util/include/dwarf-regs.h
index 46a764cf322f..a25f038bbff2 100644
--- a/tools/perf/util/include/dwarf-regs.h
+++ b/tools/perf/util/include/dwarf-regs.h
@@ -105,6 +105,7 @@ int __get_dwarf_regnum_x86_64(const char *name);
int __get_dwarf_regnum_for_perf_regnum_i386(int perf_regnum);
int __get_dwarf_regnum_for_perf_regnum_x86_64(int perf_regnum);
+int __get_dwarf_regnum_arm64(const char *name);
int __get_dwarf_regnum_for_perf_regnum_arm(int perf_regnum);
int __get_dwarf_regnum_for_perf_regnum_arm64(int perf_regnum);
--
2.34.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v2 07/16] perf annotate-arm64: Implement extract_op_location() callback
2026-04-03 9:47 [PATCH v2 00/16] perf arm64: Support data type profiling Tengda Wu
` (5 preceding siblings ...)
2026-04-03 9:47 ` [PATCH v2 06/16] perf dwarf-regs: Adapt get_dwarf_regnum() for arm64 Tengda Wu
@ 2026-04-03 9:47 ` Tengda Wu
2026-04-07 7:26 ` Namhyung Kim
2026-04-03 9:47 ` [PATCH v2 08/16] perf annotate-arm64: Enable instruction tracking support Tengda Wu
` (9 subsequent siblings)
16 siblings, 1 reply; 22+ messages in thread
From: Tengda Wu @ 2026-04-03 9:47 UTC (permalink / raw)
To: Peter Zijlstra, Namhyung Kim, leo.yan, Li Huafei, Ian Rogers,
Kim Phillips, Mark Rutland, Arnaldo Carvalho de Melo, Ingo Molnar
Cc: Bill Wendling, Nick Desaulniers, Alexander Shishkin,
Adrian Hunter, Zecheng Li, linux-perf-users, linux-kernel, llvm,
Tengda Wu
Implement the extract_op_location() callback for the arm64 architecture
to handle its specific assembly syntax and addressing modes.
The extractor handles:
1. Standalone immediate operands (e.g., #0x10).
2. Memory references with diverse addressing modes:
- Signed offset: [base, #imm]
- Pre-index: [base, #imm]!
- Post-index: [base], #imm
3. Multi-register operands and primary/secondary register extraction.
This enables 'perf annotate' to resolve memory locations and registers
required for data type profiling on arm64.
Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
---
.../perf/util/annotate-arch/annotate-arm64.c | 64 +++++++++++++++++++
tools/perf/util/annotate.c | 12 ++--
tools/perf/util/annotate.h | 10 +++
3 files changed, 81 insertions(+), 5 deletions(-)
diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
index 8209faaa6086..1fe4c503431b 100644
--- a/tools/perf/util/annotate-arch/annotate-arm64.c
+++ b/tools/perf/util/annotate-arch/annotate-arm64.c
@@ -191,6 +191,69 @@ static const struct ins_ops *arm64__associate_instruction_ops(struct arch *arch,
return ops;
}
+static int extract_op_location_arm64(const struct arch *arch,
+ struct disasm_line *dl __maybe_unused,
+ const char *op_str, int op_idx __maybe_unused,
+ struct annotated_op_loc *op_loc)
+{
+ const char *s = op_str;
+ char *p = NULL;
+
+ if (op_str == NULL)
+ return 0;
+
+ /* Handle standalone immediate operands (e.g., #0x10) */
+ if (*s == arch->objdump.imm_char) {
+ op_loc->offset = strtol(s + 1, &p, 0);
+ if (p && p != s + 1)
+ op_loc->imm = true;
+ return 0;
+ }
+
+ /*
+ * Handle memory references (e.g., [x0, #8]), identify
+ * arm64 specific addressing modes
+ */
+ if (*s == arch->objdump.memory_ref_char) {
+ op_loc->mem_ref = true;
+
+ p = strchr(s, ']');
+ if (p == NULL)
+ return -1;
+
+ /* Pre-index: [base, #imm]! */
+ if (p[1] == '!')
+ op_loc->addr_mode = INSN_ADDR_PRE_INDEX;
+ /* Post-index: [base], #imm */
+ else if (p[1] == ',' && strchr(p + 1, arch->objdump.imm_char))
+ op_loc->addr_mode = INSN_ADDR_POST_INDEX;
+ /* Signed offset: [base{, #imm}] */
+ else
+ op_loc->addr_mode = INSN_ADDR_SIGNED_OFFSET;
+
+ s++;
+ }
+
+ /* Extract the primary register */
+ op_loc->reg1 = arch__dwarf_regnum(arch, s);
+ if (op_loc->reg1 == -1)
+ return -1;
+
+ /* Move to the next symbol of the operand, if any */
+ s = strchr(s, ',');
+ if (s == NULL)
+ return 0;
+ s = skip_spaces(s + 1);
+
+ /* Parse secondary register or immediate offset */
+ if (op_loc->multi_regs)
+ op_loc->reg2 = arch__dwarf_regnum(arch, s);
+ else if (*s == arch->objdump.imm_char)
+ op_loc->offset = strtol(s + 1, &p, 0);
+
+ return 0;
+}
+
const struct arch *arch__new_arm64(const struct e_machine_and_e_flags *id,
const char *cpuid __maybe_unused)
{
@@ -209,6 +272,7 @@ const struct arch *arch__new_arm64(const struct e_machine_and_e_flags *id,
arch->objdump.memory_ref_char = '[';
arch->objdump.imm_char = '#';
arch->associate_instruction_ops = arm64__associate_instruction_ops;
+ arch->extract_op_location = extract_op_location_arm64;
/* bl, blr */
err = regcomp(&arm->call_insn, "^blr?$", REG_EXTENDED);
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 1bf69e00d76d..c4d1cb3a7ae4 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -2452,19 +2452,21 @@ int annotate_check_args(void)
int arch__dwarf_regnum(const struct arch *arch, const char *str)
{
- const char *p;
+ const char *p = str;
char *regname, *q;
int reg;
- p = strchr(str, arch->objdump.register_char);
- if (p == NULL)
- return -1;
+ if (arch->objdump.register_char) {
+ p = strchr(str, arch->objdump.register_char);
+ if (p == NULL)
+ return -1;
+ }
regname = strdup(p);
if (regname == NULL)
return -1;
- q = strpbrk(regname, ",) ");
+ q = strpbrk(regname, ",)] ");
if (q)
*q = '\0';
diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
index 71195a27d38f..0391c6a9f011 100644
--- a/tools/perf/util/annotate.h
+++ b/tools/perf/util/annotate.h
@@ -496,12 +496,21 @@ int annotate_check_args(void);
int arch__dwarf_regnum(const struct arch *arch, const char *str);
+enum annotated_addr_mode {
+ INSN_ADDR_NONE = 0,
+
+ INSN_ADDR_SIGNED_OFFSET,
+ INSN_ADDR_PRE_INDEX,
+ INSN_ADDR_POST_INDEX,
+};
+
/**
* struct annotated_op_loc - Location info of instruction operand
* @reg1: First register in the operand
* @reg2: Second register in the operand
* @offset: Memory access offset in the operand
* @segment: Segment selector register
+ * @addr_mode: Addressing mode, only valid if @mem_ref is true
* @mem_ref: Whether the operand accesses memory
* @multi_regs: Whether the second register is used
* @imm: Whether the operand is an immediate value (in offset)
@@ -511,6 +520,7 @@ struct annotated_op_loc {
int reg2;
int offset;
u8 segment;
+ u8 addr_mode;
bool mem_ref;
bool multi_regs;
bool imm;
--
2.34.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v2 08/16] perf annotate-arm64: Enable instruction tracking support
2026-04-03 9:47 [PATCH v2 00/16] perf arm64: Support data type profiling Tengda Wu
` (6 preceding siblings ...)
2026-04-03 9:47 ` [PATCH v2 07/16] perf annotate-arm64: Implement extract_op_location() callback Tengda Wu
@ 2026-04-03 9:47 ` Tengda Wu
2026-04-03 9:47 ` [PATCH v2 09/16] perf annotate-arm64: Support load instruction tracking Tengda Wu
` (8 subsequent siblings)
16 siblings, 0 replies; 22+ messages in thread
From: Tengda Wu @ 2026-04-03 9:47 UTC (permalink / raw)
To: Peter Zijlstra, Namhyung Kim, leo.yan, Li Huafei, Ian Rogers,
Kim Phillips, Mark Rutland, Arnaldo Carvalho de Melo, Ingo Molnar
Cc: Bill Wendling, Nick Desaulniers, Alexander Shishkin,
Adrian Hunter, Zecheng Li, linux-perf-users, linux-kernel, llvm,
Tengda Wu
Enable instruction tracking for the arm64 architecture in 'perf annotate'
to support data type profiling.
Define ARM64_REG_SP as 31 to correctly identify the stack pointer
register during type state initialization. Update
arch_supports_insn_tracking() to include arm64, which allows
find_data_type_block() to process the instruction scope for arm64.
Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
---
tools/perf/util/annotate-data.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/tools/perf/util/annotate-data.c b/tools/perf/util/annotate-data.c
index 1eff0a27237d..fd6416d43a2e 100644
--- a/tools/perf/util/annotate-data.c
+++ b/tools/perf/util/annotate-data.c
@@ -28,6 +28,7 @@
/* register number of the stack pointer */
#define X86_REG_SP 7
+#define ARM64_REG_SP 31
static void delete_var_types(struct die_var_type *var_types);
@@ -177,7 +178,8 @@ static void init_type_state(struct type_state *state, const struct arch *arch)
state->regs[11].caller_saved = true;
state->ret_reg = 0;
state->stack_reg = X86_REG_SP;
- }
+ } else if (arch__is_arm64(arch))
+ state->stack_reg = ARM64_REG_SP;
}
static void exit_type_state(struct type_state *state)
@@ -1421,7 +1423,8 @@ static enum type_match_result find_data_type_insn(struct data_loc_info *dloc,
static int arch_supports_insn_tracking(struct data_loc_info *dloc)
{
- if ((arch__is_x86(dloc->arch)) || (arch__is_powerpc(dloc->arch)))
+ if ((arch__is_x86(dloc->arch)) || (arch__is_powerpc(dloc->arch)) ||
+ (arch__is_arm64(dloc->arch)))
return 1;
return 0;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v2 09/16] perf annotate-arm64: Support load instruction tracking
2026-04-03 9:47 [PATCH v2 00/16] perf arm64: Support data type profiling Tengda Wu
` (7 preceding siblings ...)
2026-04-03 9:47 ` [PATCH v2 08/16] perf annotate-arm64: Enable instruction tracking support Tengda Wu
@ 2026-04-03 9:47 ` Tengda Wu
2026-04-03 9:47 ` [PATCH v2 10/16] perf annotate-arm64: Support store " Tengda Wu
` (7 subsequent siblings)
16 siblings, 0 replies; 22+ messages in thread
From: Tengda Wu @ 2026-04-03 9:47 UTC (permalink / raw)
To: Peter Zijlstra, Namhyung Kim, leo.yan, Li Huafei, Ian Rogers,
Kim Phillips, Mark Rutland, Arnaldo Carvalho de Melo, Ingo Molnar
Cc: Bill Wendling, Nick Desaulniers, Alexander Shishkin,
Adrian Hunter, Zecheng Li, linux-perf-users, linux-kernel, llvm,
Tengda Wu
Implement update_insn_state() for arm64 to track register state changes
during load (LDR) instructions. This is essential for maintaining accurate
type information when data is moved from memory to registers.
The implementation handles the three primary arm64 addressing modes:
1. Signed offset: [base, #imm]
2. Pre-index: [base, #imm]!
3. Post-index: [base], #imm
Introduce adjust_reg_index_state() to handle the side effects of pre-index
and post-index addressing, where the base register is updated with the
offset after or before the memory access. This ensures that the register's
offset within a structure is correctly tracked across sequential
instructions.
A real-world example is shown below:
ffff80008011f5b0 <pick_task_stop>:
ffff80008011f5b8: ldr x0, [x0, #2712] // x0: struct rq* -> task_struct*
ffff80008011f5c0: ldr w1, [x0, #104] // PMU sample at offset 0x68
Before this commit, the type of x0 was incorrectly inferred as 'struct rq':
find data type for 0x68(reg0) at pick_task_stop+0x10
var [8] reg0 offset 0 type='struct rq*'
chk [10] reg0 offset=0x68 ok=1 kind=1 (struct rq*) : Good!
final result: type='struct rq'
After this commit, the type of x0 is correctly inferred as 'struct task_struct':
find data type for 0x68(reg0) at pick_task_stop+0x10
var [8] reg0 offset 0 type='struct rq*'
ldr [8] 0xa98(reg0) -> reg0 type='struct task_struct*'
chk [10] reg0 offset=0x68 ok=1 kind=1 (struct task_struct*) : Good!
final result: type='struct task_struct'
Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
---
.../perf/util/annotate-arch/annotate-arm64.c | 87 +++++++++++++++++++
1 file changed, 87 insertions(+)
diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
index 1fe4c503431b..cac2bf0021c9 100644
--- a/tools/perf/util/annotate-arch/annotate-arm64.c
+++ b/tools/perf/util/annotate-arch/annotate-arm64.c
@@ -9,6 +9,8 @@
#include <regex.h>
#include "../annotate.h"
#include "../disasm.h"
+#include "../annotate-data.h"
+#include "../debug.h"
struct arch_arm64 {
struct arch arch;
@@ -254,6 +256,88 @@ static int extract_op_location_arm64(const struct arch *arch,
return 0;
}
+#ifdef HAVE_LIBDW_SUPPORT
+static int get_mem_offset(struct annotated_op_loc *op_loc, int type_offset)
+{
+ if (op_loc->addr_mode == INSN_ADDR_POST_INDEX)
+ return type_offset;
+
+ return op_loc->offset + type_offset;
+}
+
+static void adjust_reg_index_state(struct type_state *state, int reg,
+ struct annotated_op_loc *op_loc,
+ const char *insn_name, u32 insn_offset)
+{
+ struct type_state_reg *tsr;
+
+ if (!has_reg_type(state, reg) ||
+ (op_loc->addr_mode != INSN_ADDR_PRE_INDEX &&
+ op_loc->addr_mode != INSN_ADDR_POST_INDEX))
+ return;
+
+ tsr = &state->regs[reg];
+ tsr->offset = op_loc->offset + tsr->offset;
+ tsr->ok = true;
+
+ pr_debug_dtp("%s [%x] %s-index %#x(reg%d) -> reg%d", insn_name,
+ insn_offset, op_loc->addr_mode == INSN_ADDR_PRE_INDEX ?
+ "pre" : "post", op_loc->offset, reg, reg);
+ pr_debug_type_name(&tsr->type, tsr->kind);
+}
+
+static void update_insn_state_arm64(struct type_state *state,
+ struct data_loc_info *dloc, Dwarf_Die * cu_die __maybe_unused,
+ struct disasm_line *dl)
+{
+ struct annotated_insn_loc loc;
+ struct annotated_op_loc *src = &loc.ops[INSN_OP_SOURCE];
+ struct annotated_op_loc *dst = &loc.ops[INSN_OP_TARGET];
+ struct type_state_reg *tsr;
+ Dwarf_Die type_die;
+ u32 insn_offset = dl->al.offset;
+ int sreg, dreg;
+
+ if (annotate_get_insn_location(dloc->arch, dl, &loc) < 0)
+ return;
+
+ sreg = src->reg1;
+ dreg = dst->reg1;
+
+ /* Memory to register transfers */
+ if (!strncmp(dl->ins.name, "ld", 2)) {
+ struct type_state_reg dst_tsr;
+
+ if (!has_reg_type(state, sreg) ||
+ !has_reg_type(state, dreg) ||
+ !state->regs[dreg].ok)
+ return;
+
+ tsr = &state->regs[sreg];
+ tsr->copied_from = -1;
+ dst_tsr = state->regs[dreg];
+
+ /* Dereference the pointer if it has one */
+ if (dst_tsr.kind == TSR_KIND_TYPE &&
+ die_deref_ptr_type(&dst_tsr.type,
+ get_mem_offset(dst, dst_tsr.offset),
+ &type_die)) {
+ tsr->type = type_die;
+ tsr->kind = TSR_KIND_TYPE;
+ tsr->offset = 0;
+ tsr->ok = true;
+
+ pr_debug_dtp("ldr [%x] %#x(reg%d) -> reg%d",
+ insn_offset, dst->offset, dreg, sreg);
+ pr_debug_type_name(&tsr->type, tsr->kind);
+
+ adjust_reg_index_state(state, dreg, dst, "ldr", insn_offset);
+ }
+ return;
+ }
+}
+#endif
+
const struct arch *arch__new_arm64(const struct e_machine_and_e_flags *id,
const char *cpuid __maybe_unused)
{
@@ -273,6 +357,9 @@ const struct arch *arch__new_arm64(const struct e_machine_and_e_flags *id,
arch->objdump.imm_char = '#';
arch->associate_instruction_ops = arm64__associate_instruction_ops;
arch->extract_op_location = extract_op_location_arm64;
+#ifdef HAVE_LIBDW_SUPPORT
+ arch->update_insn_state = update_insn_state_arm64;
+#endif
/* bl, blr */
err = regcomp(&arm->call_insn, "^blr?$", REG_EXTENDED);
--
2.34.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v2 10/16] perf annotate-arm64: Support store instruction tracking
2026-04-03 9:47 [PATCH v2 00/16] perf arm64: Support data type profiling Tengda Wu
` (8 preceding siblings ...)
2026-04-03 9:47 ` [PATCH v2 09/16] perf annotate-arm64: Support load instruction tracking Tengda Wu
@ 2026-04-03 9:47 ` Tengda Wu
2026-04-03 9:47 ` [PATCH v2 11/16] perf annotate-arm64: Support stack variable tracking Tengda Wu
` (6 subsequent siblings)
16 siblings, 0 replies; 22+ messages in thread
From: Tengda Wu @ 2026-04-03 9:47 UTC (permalink / raw)
To: Peter Zijlstra, Namhyung Kim, leo.yan, Li Huafei, Ian Rogers,
Kim Phillips, Mark Rutland, Arnaldo Carvalho de Melo, Ingo Molnar
Cc: Bill Wendling, Nick Desaulniers, Alexander Shishkin,
Adrian Hunter, Zecheng Li, linux-perf-users, linux-kernel, llvm,
Tengda Wu
Extend update_insn_state() for arm64 to handle store (STR) instructions.
Unlike load instructions, store operations do not change the data type
of the registers involved. However, arm64 store instructions sometimes
use pre-index or post-index addressing modes (e.g., str x1, [x0, #8]!),
which modify the base register as a side effect of the memory access.
Call adjust_reg_index_state() for store instructions to ensure the
base register's offset is correctly updated in the type state. This
maintains synchronization between the hardware register state and the
instruction tracker's model during sequential memory operations.
Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
---
tools/perf/util/annotate-arch/annotate-arm64.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
index cac2bf0021c9..28647a778802 100644
--- a/tools/perf/util/annotate-arch/annotate-arm64.c
+++ b/tools/perf/util/annotate-arch/annotate-arm64.c
@@ -335,6 +335,16 @@ static void update_insn_state_arm64(struct type_state *state,
}
return;
}
+
+ /* Register to memory transfers */
+ if (!strncmp(dl->ins.name, "st", 2)) {
+ /*
+ * Store instructions do not change the register type,
+ * but the base register must be updated for pre/post-index
+ * modes.
+ */
+ adjust_reg_index_state(state, dreg, dst, "str", insn_offset);
+ }
}
#endif
--
2.34.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v2 11/16] perf annotate-arm64: Support stack variable tracking
2026-04-03 9:47 [PATCH v2 00/16] perf arm64: Support data type profiling Tengda Wu
` (9 preceding siblings ...)
2026-04-03 9:47 ` [PATCH v2 10/16] perf annotate-arm64: Support store " Tengda Wu
@ 2026-04-03 9:47 ` Tengda Wu
2026-04-03 9:47 ` [PATCH v2 12/16] perf annotate-arm64: Support 'mov' instruction tracking Tengda Wu
` (5 subsequent siblings)
16 siblings, 0 replies; 22+ messages in thread
From: Tengda Wu @ 2026-04-03 9:47 UTC (permalink / raw)
To: Peter Zijlstra, Namhyung Kim, leo.yan, Li Huafei, Ian Rogers,
Kim Phillips, Mark Rutland, Arnaldo Carvalho de Melo, Ingo Molnar
Cc: Bill Wendling, Nick Desaulniers, Alexander Shishkin,
Adrian Hunter, Zecheng Li, linux-perf-users, linux-kernel, llvm,
Tengda Wu
Extend update_insn_state() for arm64 to track data types stored on the
stack. This allows 'perf annotate' to maintain type information for
local variables that are spilled to or loaded from stack slots.
The implementation handles:
1. Stack Loads (LDR): Identify when a register is loaded from a stack
slot and update the register's type state based on the tracked
stack content or compound member types.
2. Stack Stores (STR): Update or create new stack state entries when
a tracked register type is stored to the stack.
This enables the instruction tracker to follow data types as they move
between registers and memory, specifically for function local variables
and compiler-spilled values on arm64.
Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
---
.../perf/util/annotate-arch/annotate-arm64.c | 83 ++++++++++++++++++-
1 file changed, 80 insertions(+), 3 deletions(-)
diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
index 28647a778802..f9100230c2f6 100644
--- a/tools/perf/util/annotate-arch/annotate-arm64.c
+++ b/tools/perf/util/annotate-arch/annotate-arm64.c
@@ -11,6 +11,8 @@
#include "../disasm.h"
#include "../annotate-data.h"
#include "../debug.h"
+#include "../map.h"
+#include "../symbol.h"
struct arch_arm64 {
struct arch arch;
@@ -297,6 +299,8 @@ static void update_insn_state_arm64(struct type_state *state,
Dwarf_Die type_die;
u32 insn_offset = dl->al.offset;
int sreg, dreg;
+ int fbreg = dloc->fbreg;
+ int fboff = 0;
if (annotate_get_insn_location(dloc->arch, dl, &loc) < 0)
return;
@@ -304,17 +308,59 @@ static void update_insn_state_arm64(struct type_state *state,
sreg = src->reg1;
dreg = dst->reg1;
+ if (dloc->fb_cfa) {
+ u64 ip = dloc->ms->sym->start + dl->al.offset;
+ u64 pc = map__rip_2objdump(dloc->ms->map, ip);
+
+ if (die_get_cfa(dloc->di->dbg, pc, &fbreg, &fboff) < 0)
+ fbreg = -1;
+ }
+
/* Memory to register transfers */
if (!strncmp(dl->ins.name, "ld", 2)) {
struct type_state_reg dst_tsr;
- if (!has_reg_type(state, sreg) ||
- !has_reg_type(state, dreg) ||
- !state->regs[dreg].ok)
+ if (!has_reg_type(state, sreg))
return;
tsr = &state->regs[sreg];
tsr->copied_from = -1;
+
+ /* Check stack variables with offset */
+ if (sreg == fbreg || sreg == state->stack_reg) {
+ struct type_state_stack *stack;
+ int offset = src->offset - fboff;
+
+ stack = find_stack_state(state, offset);
+ if (stack == NULL) {
+ tsr->ok = false;
+ return;
+ } else if (!stack->compound) {
+ tsr->type = stack->type;
+ tsr->kind = stack->kind;
+ tsr->offset = stack->ptr_offset;
+ tsr->ok = true;
+ } else if (die_get_member_type(&stack->type,
+ offset - stack->offset,
+ &type_die)) {
+ tsr->type = type_die;
+ tsr->kind = TSR_KIND_TYPE;
+ tsr->offset = 0;
+ tsr->ok = true;
+ } else {
+ tsr->ok = false;
+ return;
+ }
+
+ pr_debug_dtp("ldr [%x] -%#x(stack) -> reg%d",
+ insn_offset, -offset, sreg);
+ pr_debug_type_name(&tsr->type, tsr->kind);
+ return;
+ }
+
+ if (!has_reg_type(state, dreg) || !state->regs[dreg].ok)
+ return;
+
dst_tsr = state->regs[dreg];
/* Dereference the pointer if it has one */
@@ -338,6 +384,37 @@ static void update_insn_state_arm64(struct type_state *state,
/* Register to memory transfers */
if (!strncmp(dl->ins.name, "st", 2)) {
+ /* Check stack variables with offset */
+ if (dreg == fbreg || dreg == state->stack_reg) {
+ struct type_state_stack *stack;
+ int offset = dst->offset - fboff;
+
+ if (!has_reg_type(state, sreg) ||
+ !state->regs[sreg].ok)
+ return;
+
+ tsr = &state->regs[sreg];
+
+ stack = find_stack_state(state, offset);
+ if (stack) {
+ if (!stack->compound)
+ set_stack_state(stack, offset, tsr->kind,
+ &tsr->type, tsr->offset);
+ } else {
+ findnew_stack_state(state, offset, tsr->kind,
+ &tsr->type, tsr->offset);
+ }
+
+ pr_debug_dtp("str [%x] reg%d -> -%#x(stack)",
+ insn_offset, sreg, -offset);
+ if (tsr->offset != 0) {
+ pr_debug_dtp(" reg%d offset %#x ->",
+ sreg, tsr->offset);
+ }
+ pr_debug_type_name(&tsr->type, tsr->kind);
+ return;
+ }
+
/*
* Store instructions do not change the register type,
* but the base register must be updated for pre/post-index
--
2.34.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v2 12/16] perf annotate-arm64: Support 'mov' instruction tracking
2026-04-03 9:47 [PATCH v2 00/16] perf arm64: Support data type profiling Tengda Wu
` (10 preceding siblings ...)
2026-04-03 9:47 ` [PATCH v2 11/16] perf annotate-arm64: Support stack variable tracking Tengda Wu
@ 2026-04-03 9:47 ` Tengda Wu
2026-04-03 9:47 ` [PATCH v2 13/16] perf annotate-arm64: Support 'add' " Tengda Wu
` (4 subsequent siblings)
16 siblings, 0 replies; 22+ messages in thread
From: Tengda Wu @ 2026-04-03 9:47 UTC (permalink / raw)
To: Peter Zijlstra, Namhyung Kim, leo.yan, Li Huafei, Ian Rogers,
Kim Phillips, Mark Rutland, Arnaldo Carvalho de Melo, Ingo Molnar
Cc: Bill Wendling, Nick Desaulniers, Alexander Shishkin,
Adrian Hunter, Zecheng Li, linux-perf-users, linux-kernel, llvm,
Tengda Wu
Extend update_insn_state() for arm64 to handle register-to-register
'mov' instructions.
When a 'mov' instruction occurs between two registers, the data type
information (DWARF type, kind, and offset) needs to be propagated from
the source register to the destination register. This ensures that if
a pointer or a structure was previously identified in one register,
the tracker continues to recognize it after it is moved.
A real-world example is shown below:
ffff8000803eebf8 <get_vma_policy>:
ffff8000803eec20: mov x21, x0 // x0 (struct vm_area_struct*) -> x21
ffff8000803eec28: ldr x2, [x0, #112]
ffff8000803eec2c: cbz x2, ffff8000803eec94 <get_vma_policy+0x9c>
ffff8000803eec94: ldr x0, [x21, #152] // PMU sample
Before this commit, the type of x21 was unknown, causing the subsequent
inference to fail:
var [0] reg0 offset 0 type='struct vm_area_struct*' size=0x8
chk [9c] reg21 offset=0x98 ok=0 kind=0 cfa : no type information
final result: no type information
After this commit, the type of x21 is correctly inferred as 'vm_area_struct':
var [0] reg0 offset 0 type='struct vm_area_struct*' size=0x8
mov [28] reg0 -> reg21 type='struct vm_area_struct*' size=0x8
chk [9c] reg21 offset=0x98 ok=1 kind=1 (struct vm_area_struct*) : Good!
found by insn track: 0x98(reg21) type-offset=0x98
final result: type='struct vm_area_struct' size=0xb0
Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
---
.../perf/util/annotate-arch/annotate-arm64.c | 28 +++++++++++++++++++
1 file changed, 28 insertions(+)
diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
index f9100230c2f6..013b673f4861 100644
--- a/tools/perf/util/annotate-arch/annotate-arm64.c
+++ b/tools/perf/util/annotate-arch/annotate-arm64.c
@@ -308,6 +308,34 @@ static void update_insn_state_arm64(struct type_state *state,
sreg = src->reg1;
dreg = dst->reg1;
+ /* Register to register transfers */
+ if (!strcmp(dl->ins.name, "mov")) {
+ if (!has_reg_type(state, sreg))
+ return;
+
+ tsr = &state->regs[sreg];
+ tsr->copied_from = -1;
+
+ if (!has_reg_type(state, dreg) ||
+ !state->regs[dreg].ok) {
+ tsr->ok = false;
+ return;
+ }
+
+ tsr->type = state->regs[dreg].type;
+ tsr->kind = state->regs[dreg].kind;
+ tsr->offset = state->regs[dreg].offset;
+ tsr->ok = true;
+
+ if (tsr->kind == TSR_KIND_TYPE || tsr->kind == TSR_KIND_POINTER)
+ tsr->copied_from = dreg;
+
+ pr_debug_dtp("mov [%x] reg%d -> reg%d",
+ insn_offset, dreg, sreg);
+ pr_debug_type_name(&tsr->type, tsr->kind);
+ return;
+ }
+
if (dloc->fb_cfa) {
u64 ip = dloc->ms->sym->start + dl->al.offset;
u64 pc = map__rip_2objdump(dloc->ms->map, ip);
--
2.34.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v2 13/16] perf annotate-arm64: Support 'add' instruction tracking
2026-04-03 9:47 [PATCH v2 00/16] perf arm64: Support data type profiling Tengda Wu
` (11 preceding siblings ...)
2026-04-03 9:47 ` [PATCH v2 12/16] perf annotate-arm64: Support 'mov' instruction tracking Tengda Wu
@ 2026-04-03 9:47 ` Tengda Wu
2026-04-03 9:47 ` [PATCH v2 14/16] perf annotate-arm64: Support 'adrp' instruction to track global variables Tengda Wu
` (3 subsequent siblings)
16 siblings, 0 replies; 22+ messages in thread
From: Tengda Wu @ 2026-04-03 9:47 UTC (permalink / raw)
To: Peter Zijlstra, Namhyung Kim, leo.yan, Li Huafei, Ian Rogers,
Kim Phillips, Mark Rutland, Arnaldo Carvalho de Melo, Ingo Molnar
Cc: Bill Wendling, Nick Desaulniers, Alexander Shishkin,
Adrian Hunter, Zecheng Li, linux-perf-users, linux-kernel, llvm,
Tengda Wu
Extend update_insn_state() for arm64 to track pointer arithmetic and
member address calculations.
The arm64 'add' instruction frequently calculates structure member
addresses, such as 'add x0, x1, #offset'. Tracking this is essential
to maintain the connection between a base pointer and its derived
member addresses.
The implementation checks if the base register contains a pointer
or a structure type. When an immediate offset is added, use
die_get_member_type() to verify that the resulting offset points to
a valid member within the data type. If valid, update the target
register's type state with the new offset while preserving the base
type information.
A real-world example is shown below:
ffff80008001c9a8 <flush_ptrace_hw_breakpoint>:
ffff80008001c9c4: add x19, x0, #0xeb8 // x0 (task_struct*) + 0xeb8 -> x19
ffff80008001c9d0: ldr x0, [x19] // PMU sample
Before this commit, the type flow broke at the 'add' instruction,
leaving the subsequent load with no type information:
chk [28] reg19 offset=0 ok=0 kind=0 cfa : no type information
final result: no type information
After this commit, the tracker correctly follows the member address
calculation:
var [0] reg0 offset 0 type='struct task_struct*'
add [1c] address of 0xeb8(reg0) -> reg19 type='struct task_struct*'
chk [28] reg19 offset=0 ok=1 kind=1 (struct task_struct*) : Good!
found by insn track: 0(reg19) type-offset=0xeb8
final result: type='struct task_struct'
Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
---
.../perf/util/annotate-arch/annotate-arm64.c | 45 +++++++++++++++++++
1 file changed, 45 insertions(+)
diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
index 013b673f4861..d2557b9d6909 100644
--- a/tools/perf/util/annotate-arch/annotate-arm64.c
+++ b/tools/perf/util/annotate-arch/annotate-arm64.c
@@ -7,6 +7,7 @@
#include <linux/zalloc.h>
#include <linux/string.h>
#include <regex.h>
+#include <inttypes.h>
#include "../annotate.h"
#include "../disasm.h"
#include "../annotate-data.h"
@@ -308,6 +309,50 @@ static void update_insn_state_arm64(struct type_state *state,
sreg = src->reg1;
dreg = dst->reg1;
+ if (!strcmp(dl->ins.name, "add")) {
+ struct type_state_reg dst_tsr;
+
+ if (!has_reg_type(state, sreg) ||
+ !has_reg_type(state, dreg) ||
+ !state->regs[dreg].ok)
+ return;
+
+ tsr = &state->regs[sreg];
+ tsr->copied_from = -1;
+ dst_tsr = state->regs[dreg];
+
+ /* Handle calculation of a register holding a typed pointer */
+ if (dst_tsr.kind == TSR_KIND_POINTER ||
+ (dst_tsr.kind == TSR_KIND_TYPE &&
+ dwarf_tag(&dst_tsr.type) == DW_TAG_pointer_type)) {
+ s32 offset;
+
+ if (dst_tsr.kind == TSR_KIND_TYPE &&
+ __die_get_real_type(&dst_tsr.type, &type_die) == NULL)
+ return;
+
+ if (dst_tsr.kind == TSR_KIND_POINTER)
+ type_die = dst_tsr.type;
+
+ /* Check if the target type has a member at the new offset */
+ offset = dst->offset + dst_tsr.offset;
+ if (die_get_member_type(&type_die, offset, &type_die) == NULL)
+ return;
+
+ tsr->type = dst_tsr.type;
+ tsr->kind = dst_tsr.kind;
+ tsr->offset = offset;
+ tsr->ok = true;
+
+ pr_debug_dtp("add [%x] address of %s%#x(reg%d) -> reg%d",
+ insn_offset, dst->offset < 0 ? "-" : "",
+ abs(dst->offset), dreg, sreg);
+
+ pr_debug_type_name(&tsr->type, tsr->kind);
+ }
+ return;
+ }
+
/* Register to register transfers */
if (!strcmp(dl->ins.name, "mov")) {
if (!has_reg_type(state, sreg))
--
2.34.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v2 14/16] perf annotate-arm64: Support 'adrp' instruction to track global variables
2026-04-03 9:47 [PATCH v2 00/16] perf arm64: Support data type profiling Tengda Wu
` (12 preceding siblings ...)
2026-04-03 9:47 ` [PATCH v2 13/16] perf annotate-arm64: Support 'add' " Tengda Wu
@ 2026-04-03 9:47 ` Tengda Wu
2026-04-03 9:47 ` [PATCH v2 15/16] perf annotate-arm64: Support per-cpu variable access tracking Tengda Wu
` (2 subsequent siblings)
16 siblings, 0 replies; 22+ messages in thread
From: Tengda Wu @ 2026-04-03 9:47 UTC (permalink / raw)
To: Peter Zijlstra, Namhyung Kim, leo.yan, Li Huafei, Ian Rogers,
Kim Phillips, Mark Rutland, Arnaldo Carvalho de Melo, Ingo Molnar
Cc: Bill Wendling, Nick Desaulniers, Alexander Shishkin,
Adrian Hunter, Zecheng Li, linux-perf-users, linux-kernel, llvm,
Tengda Wu
Extend update_insn_state() for arm64 to track global variable types
calculated via PC-relative addressing.
On arm64, global variables are typically accessed by first calculating
the page address using 'adrp', followed by an 'add' or 'ldr' to get the
specific symbol address. Without tracking 'adrp', the instruction
tracker loses the base address, making it impossible to resolve
global symbols and their associated DWARF types.
Introduce TSR_KIND_GLOBAL_ADDR to represent a partial global address
state. When encountering 'adrp', store the page-aligned target address
in the register's type state. Upon a subsequent 'add' or 'ldr'
instruction that references a TSR_KIND_GLOBAL_ADDR register, combine
the page address with the immediate offset.
A real-world example is shown below:
ffff80008032e008 <folios_put_refs>:
ffff80008032e048: adrp x24, ffff80008202f000 <nr_cpu_ids>
ffff80008032e050: add x24, x24, #0xd40
ffff80008032e078: ldr x0, [x24] // PMU sample
Before this commit, x24 was unknown, leading to no type information:
chk [70] reg24 offset=0 ok=0 kind=0 cfa : no type information
final result: no type information
After this commit, the tracker correctly follows the adrp/add flow:
adrp [40] global addr=ffff80008202f000 -> reg24
add [48] global addr=ffff80008202fd40 -> reg24
chk [70] reg24 offset=0 ok=1 kind=7 global addr : Good!
final result: type='struct folio*'
Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
---
.../perf/util/annotate-arch/annotate-arm64.c | 64 ++++++++++++++++++-
tools/perf/util/annotate-arch/annotate-x86.c | 6 +-
tools/perf/util/annotate-data.c | 32 ++++++++--
tools/perf/util/annotate-data.h | 7 +-
4 files changed, 97 insertions(+), 12 deletions(-)
diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
index d2557b9d6909..6b954bbfaf8d 100644
--- a/tools/perf/util/annotate-arch/annotate-arm64.c
+++ b/tools/perf/util/annotate-arch/annotate-arm64.c
@@ -290,7 +290,7 @@ static void adjust_reg_index_state(struct type_state *state, int reg,
}
static void update_insn_state_arm64(struct type_state *state,
- struct data_loc_info *dloc, Dwarf_Die * cu_die __maybe_unused,
+ struct data_loc_info *dloc, Dwarf_Die *cu_die,
struct disasm_line *dl)
{
struct annotated_insn_loc loc;
@@ -309,6 +309,23 @@ static void update_insn_state_arm64(struct type_state *state,
sreg = src->reg1;
dreg = dst->reg1;
+ if (!strcmp(dl->ins.name, "adrp")) {
+ if (!has_reg_type(state, sreg) || !dl->ops.target.addr)
+ return;
+
+ tsr = &state->regs[sreg];
+ tsr->copied_from = -1;
+ tsr->kind = TSR_KIND_GLOBAL_ADDR;
+ /* Partial page-relative address, finalized in next 'add/ldr' */
+ tsr->addr = dl->ops.target.addr;
+ tsr->offset = 0;
+ tsr->ok = true;
+
+ pr_debug_dtp("adrp [%x] global addr=%"PRIx64" -> reg%d\n",
+ insn_offset, tsr->addr, sreg);
+ return;
+ }
+
if (!strcmp(dl->ins.name, "add")) {
struct type_state_reg dst_tsr;
@@ -342,6 +359,7 @@ static void update_insn_state_arm64(struct type_state *state,
tsr->type = dst_tsr.type;
tsr->kind = dst_tsr.kind;
tsr->offset = offset;
+ tsr->addr = 0;
tsr->ok = true;
pr_debug_dtp("add [%x] address of %s%#x(reg%d) -> reg%d",
@@ -350,6 +368,18 @@ static void update_insn_state_arm64(struct type_state *state,
pr_debug_type_name(&tsr->type, tsr->kind);
}
+
+ /* Handle PC-relative global address calculation (adrp/add pair) */
+ if (dst_tsr.kind == TSR_KIND_GLOBAL_ADDR) {
+ tsr->kind = dst_tsr.kind;
+ tsr->addr = dst_tsr.addr + dst->offset;
+ tsr->offset = 0;
+ tsr->ok = true;
+
+ pr_debug_dtp("add [%x] global addr=%"PRIx64" -> reg%d\n",
+ insn_offset, tsr->addr, sreg);
+ }
+
return;
}
@@ -370,6 +400,7 @@ static void update_insn_state_arm64(struct type_state *state,
tsr->type = state->regs[dreg].type;
tsr->kind = state->regs[dreg].kind;
tsr->offset = state->regs[dreg].offset;
+ tsr->addr = state->regs[dreg].addr;
tsr->ok = true;
if (tsr->kind == TSR_KIND_TYPE || tsr->kind == TSR_KIND_POINTER)
@@ -444,6 +475,7 @@ static void update_insn_state_arm64(struct type_state *state,
tsr->type = type_die;
tsr->kind = TSR_KIND_TYPE;
tsr->offset = 0;
+ tsr->addr = 0;
tsr->ok = true;
pr_debug_dtp("ldr [%x] %#x(reg%d) -> reg%d",
@@ -451,6 +483,30 @@ static void update_insn_state_arm64(struct type_state *state,
pr_debug_type_name(&tsr->type, tsr->kind);
adjust_reg_index_state(state, dreg, dst, "ldr", insn_offset);
+ return;
+ }
+
+ /* Or check if it's a global variable */
+ if (dst_tsr.kind == TSR_KIND_GLOBAL_ADDR) {
+ u64 ip = dloc->ms->sym->start + dl->al.offset;
+ u64 addr = dst_tsr.addr + dst->offset;
+ int offset;
+
+ if (!get_global_var_type(cu_die, dloc, ip, addr, &offset,
+ &type_die) ||
+ !die_get_member_type(&type_die, offset, &type_die)) {
+ tsr->ok = false;
+ return;
+ }
+
+ tsr->type = type_die;
+ tsr->kind = TSR_KIND_TYPE;
+ tsr->offset = offset;
+ tsr->addr = addr;
+ tsr->ok = true;
+ pr_debug_dtp("ldr [%x] global (%"PRIx64") -> reg%d",
+ insn_offset, addr, sreg);
+ pr_debug_type_name(&tsr->type, tsr->kind);
}
return;
}
@@ -472,10 +528,12 @@ static void update_insn_state_arm64(struct type_state *state,
if (stack) {
if (!stack->compound)
set_stack_state(stack, offset, tsr->kind,
- &tsr->type, tsr->offset);
+ &tsr->type, tsr->offset,
+ tsr->addr);
} else {
findnew_stack_state(state, offset, tsr->kind,
- &tsr->type, tsr->offset);
+ &tsr->type, tsr->offset,
+ tsr->addr);
}
pr_debug_dtp("str [%x] reg%d -> -%#x(stack)",
diff --git a/tools/perf/util/annotate-arch/annotate-x86.c b/tools/perf/util/annotate-arch/annotate-x86.c
index c63ca3250b95..24adfbef8b76 100644
--- a/tools/perf/util/annotate-arch/annotate-x86.c
+++ b/tools/perf/util/annotate-arch/annotate-x86.c
@@ -780,10 +780,12 @@ static void update_insn_state_x86(struct type_state *state,
*/
if (!stack->compound)
set_stack_state(stack, offset, tsr->kind,
- &tsr->type, tsr->offset);
+ &tsr->type, tsr->offset,
+ tsr->addr);
} else {
findnew_stack_state(state, offset, tsr->kind,
- &tsr->type, tsr->offset);
+ &tsr->type, tsr->offset,
+ tsr->addr);
}
if (dst->reg1 == fbreg) {
diff --git a/tools/perf/util/annotate-data.c b/tools/perf/util/annotate-data.c
index fd6416d43a2e..b75d50b2c46f 100644
--- a/tools/perf/util/annotate-data.c
+++ b/tools/perf/util/annotate-data.c
@@ -70,6 +70,9 @@ void pr_debug_type_name(Dwarf_Die *die, enum type_state_kind kind)
case TSR_KIND_CANARY:
pr_info(" stack canary\n");
return;
+ case TSR_KIND_GLOBAL_ADDR:
+ pr_info(" global address\n");
+ return;
case TSR_KIND_TYPE:
default:
break;
@@ -577,7 +580,7 @@ struct type_state_stack *find_stack_state(struct type_state *state,
}
void set_stack_state(struct type_state_stack *stack, int offset, u8 kind,
- Dwarf_Die *type_die, int ptr_offset)
+ Dwarf_Die *type_die, int ptr_offset, u64 addr)
{
int tag;
Dwarf_Word size;
@@ -594,6 +597,7 @@ void set_stack_state(struct type_state_stack *stack, int offset, u8 kind,
stack->offset = offset;
stack->ptr_offset = ptr_offset;
stack->kind = kind;
+ stack->addr = addr;
if (kind == TSR_KIND_POINTER) {
stack->compound = false;
@@ -616,18 +620,18 @@ void set_stack_state(struct type_state_stack *stack, int offset, u8 kind,
struct type_state_stack *findnew_stack_state(struct type_state *state,
int offset, u8 kind,
Dwarf_Die *type_die,
- int ptr_offset)
+ int ptr_offset, u64 addr)
{
struct type_state_stack *stack = find_stack_state(state, offset);
if (stack) {
- set_stack_state(stack, offset, kind, type_die, ptr_offset);
+ set_stack_state(stack, offset, kind, type_die, ptr_offset, addr);
return stack;
}
stack = malloc(sizeof(*stack));
if (stack) {
- set_stack_state(stack, offset, kind, type_die, ptr_offset);
+ set_stack_state(stack, offset, kind, type_die, ptr_offset, addr);
list_add(&stack->list, &state->stack_vars);
}
return stack;
@@ -913,7 +917,7 @@ static void update_var_state(struct type_state *state, struct data_loc_info *dlo
continue;
findnew_stack_state(state, offset, TSR_KIND_TYPE,
- &mem_die, /*ptr_offset=*/0);
+ &mem_die, /*ptr_offset=*/0, /*addr=*/0);
if (var->reg == state->stack_reg) {
pr_debug_dtp("var [%"PRIx64"] %#x(reg%d)",
@@ -1256,6 +1260,24 @@ static enum type_match_result check_matching_type(struct type_state *state,
if (dloc->op->offset < 0 && reg != state->stack_reg && reg != dloc->fbreg)
goto check_kernel;
}
+
+ if (state->regs[reg].kind == TSR_KIND_GLOBAL_ADDR) {
+ int var_offset;
+
+ pr_debug_dtp("global addr");
+
+ /*
+ * The register holds the address of a global variable. Try to
+ * find the variable by the address and get its type.
+ */
+ if (get_global_var_type(cu_die, dloc, dloc->ip, state->regs[reg].addr,
+ &var_offset, type_die)) {
+ dloc->type_offset = var_offset;
+ return PERF_TMR_OK;
+ }
+ /* No need to retry global variables */
+ return PERF_TMR_BAIL_OUT;
+ }
check_non_register:
if (reg == dloc->fbreg || reg == state->stack_reg) {
struct type_state_stack *stack;
diff --git a/tools/perf/util/annotate-data.h b/tools/perf/util/annotate-data.h
index c26130744260..bae15e1d6db8 100644
--- a/tools/perf/util/annotate-data.h
+++ b/tools/perf/util/annotate-data.h
@@ -37,6 +37,7 @@ enum type_state_kind {
TSR_KIND_PERCPU_POINTER,
TSR_KIND_POINTER,
TSR_KIND_CANARY,
+ TSR_KIND_GLOBAL_ADDR,
};
/**
@@ -187,6 +188,7 @@ struct type_state_reg {
u64 lifetime_end;
u8 kind;
u8 copied_from;
+ u64 addr;
};
/* Type information in a stack location, dynamically allocated */
@@ -199,6 +201,7 @@ struct type_state_stack {
int size;
bool compound;
u8 kind;
+ u64 addr;
};
/*
@@ -253,9 +256,9 @@ bool has_reg_type(struct type_state *state, int reg);
struct type_state_stack *findnew_stack_state(struct type_state *state,
int offset, u8 kind,
Dwarf_Die *type_die,
- int ptr_offset);
+ int ptr_offset, u64 addr);
void set_stack_state(struct type_state_stack *stack, int offset, u8 kind,
- Dwarf_Die *type_die, int ptr_offset);
+ Dwarf_Die *type_die, int ptr_offset, u64 addr);
struct type_state_stack *find_stack_state(struct type_state *state,
int offset);
bool get_global_var_type(Dwarf_Die *cu_die, struct data_loc_info *dloc,
--
2.34.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v2 15/16] perf annotate-arm64: Support per-cpu variable access tracking
2026-04-03 9:47 [PATCH v2 00/16] perf arm64: Support data type profiling Tengda Wu
` (13 preceding siblings ...)
2026-04-03 9:47 ` [PATCH v2 14/16] perf annotate-arm64: Support 'adrp' instruction to track global variables Tengda Wu
@ 2026-04-03 9:47 ` Tengda Wu
2026-04-03 9:48 ` [PATCH v2 16/16] perf annotate-arm64: Support 'mrs' instruction to track 'current' pointer Tengda Wu
2026-04-07 6:31 ` [PATCH v2 00/16] perf arm64: Support data type profiling Namhyung Kim
16 siblings, 0 replies; 22+ messages in thread
From: Tengda Wu @ 2026-04-03 9:47 UTC (permalink / raw)
To: Peter Zijlstra, Namhyung Kim, leo.yan, Li Huafei, Ian Rogers,
Kim Phillips, Mark Rutland, Arnaldo Carvalho de Melo, Ingo Molnar
Cc: Bill Wendling, Nick Desaulniers, Alexander Shishkin,
Adrian Hunter, Zecheng Li, linux-perf-users, linux-kernel, llvm,
Tengda Wu
Extend update_insn_state() for arm64 to handle per-cpu variable
addressing.
On arm64, per-cpu variables are accessed by adding a per-cpu offset
(typically from the '__per_cpu_offset' array) to the global variable's
address. This logic often results in the following instruction pattern:
adrp x4, <page>
add x4, x4, #offset // x4 = &__per_cpu_offset
ldr x6, [x4, w0, sxtw #3] // x6 = __per_cpu_offset[cpu]
...
adrp x5, <page>
add x5, x5, #offset // x5 = &global_var
ldr x0, [x6, x5] // Pattern A: direct load per-cpu instance
OR
add x0, x6, x5 // Pattern B: compute per-cpu addr
To handle such cases:
1. Identify per-cpu base initialization: Detect 'adrp/add' pairs that
resolve to the '__per_cpu_offset' symbol and mark the destination
register as TSR_KIND_PERCPU_BASE.
2. Propagate type information: During subsequent 'ldr' or 'add' steps,
if one operand is a PERCPU_BASE and the other is a global variable,
inherit the type from the global variable to correctly identify the
per-cpu instance.
A real-world example is shown below:
ffff8000808f2d28 <cppc_set_perf>:
ffff8000808f2d38: adrp x2, ffff800082033000 <event_array+0x58>
ffff8000808f2d3c: add x5, x2, #0x3f8 // x5 = &__per_cpu_offset
ffff8000808f2d44: adrp x2, ffff800081f73000 <vmcore_cb_srcu_srcu_data+0x80>
ffff8000808f2d48: add x2, x2, #0x6b8 // x2 = &cpu_pcc_subspace_idx
ffff8000808f2d6c: ldr x5, [x5, w0, sxtw #3] // x5 = __per_cpu_offset[cpu]
ffff8000808f2d80: ldr w23, [x5, x2] // PMU sample, per_cpu(cpu_pcc_subspace_idx, cpu)
Before this commit, the tracker could not link x5 back to a per-cpu
context, resulting in an incorrect data type resolution:
adrp [10] global addr=ffff800082033000 -> reg2
add [14] global addr=ffff8000820333f8 -> reg5
adrp [1c] global addr=ffff800081f73000 -> reg2
add [20] global addr=ffff800081f736b8 -> reg2
ldr [44] global (ffff8000820333f8) -> reg5 type='long unsigned int[]' size=0x1000
chk [58] reg5 offset=0 ok=1 kind=1 (long unsigned int[]) : Good!
found by insn track: 0(reg5, reg2) type-offset=0
final result: type='long unsigned int' size=0x8
After this commit, the tracker correctly identifies the per-cpu flow and
resolves the actual variable type:
ldr [44] global (ffff8000820333f8) -> reg5 percpu base
chk [58] reg5 offset=0 ok=1 kind=2 percpu var : Good!
found by insn track: 0(reg5, reg2) type-offset=0
final result: type='int' size=0x4
Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
---
.../perf/util/annotate-arch/annotate-arm64.c | 69 ++++++++++++++++++-
tools/perf/util/annotate-data.c | 33 ++++++---
2 files changed, 92 insertions(+), 10 deletions(-)
diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
index 6b954bbfaf8d..89b6b596f984 100644
--- a/tools/perf/util/annotate-arch/annotate-arm64.c
+++ b/tools/perf/util/annotate-arch/annotate-arm64.c
@@ -378,6 +378,26 @@ static void update_insn_state_arm64(struct type_state *state,
pr_debug_dtp("add [%x] global addr=%"PRIx64" -> reg%d\n",
insn_offset, tsr->addr, sreg);
+ return;
+ }
+
+ /* Handle per-cpu base addresses */
+ if (dst_tsr.kind == TSR_KIND_PERCPU_BASE) {
+ if (!dst->multi_regs || !has_reg_type(state, dst->reg2) ||
+ state->regs[dst->reg2].kind != TSR_KIND_GLOBAL_ADDR ||
+ !state->regs[dst->reg2].ok)
+ return;
+
+ /* Inherit type from the global variable */
+ tsr->type = state->regs[dst->reg2].type;
+ tsr->kind = state->regs[dst->reg2].kind;
+ tsr->offset = state->regs[dst->reg2].offset;
+ tsr->addr = state->regs[dst->reg2].addr;
+ tsr->ok = true;
+
+ pr_debug_dtp("add [%x] percpu %#"PRIx64" -> reg%d",
+ insn_offset, tsr->addr, sreg);
+ pr_debug_type_name(&tsr->type, tsr->kind);
}
return;
@@ -491,6 +511,15 @@ static void update_insn_state_arm64(struct type_state *state,
u64 ip = dloc->ms->sym->start + dl->al.offset;
u64 addr = dst_tsr.addr + dst->offset;
int offset;
+ u8 kind;
+ const char *var_name = NULL;
+
+ /* it might be per-cpu offset */
+ if (get_global_var_info(dloc, addr, &var_name, &offset) &&
+ !strcmp(var_name, "__per_cpu_offset"))
+ kind = TSR_KIND_PERCPU_BASE;
+ else
+ kind = TSR_KIND_TYPE;
if (!get_global_var_type(cu_die, dloc, ip, addr, &offset,
&type_die) ||
@@ -500,13 +529,49 @@ static void update_insn_state_arm64(struct type_state *state,
}
tsr->type = type_die;
- tsr->kind = TSR_KIND_TYPE;
+ tsr->kind = kind;
tsr->offset = offset;
- tsr->addr = addr;
+ tsr->addr = 0;
tsr->ok = true;
+
pr_debug_dtp("ldr [%x] global (%"PRIx64") -> reg%d",
insn_offset, addr, sreg);
pr_debug_type_name(&tsr->type, tsr->kind);
+ return;
+ }
+
+ /* Or check if it's a per-cpu base address */
+ if (dst_tsr.kind == TSR_KIND_PERCPU_BASE) {
+ u64 ip = dloc->ms->sym->start + dl->al.offset;
+ u64 addr;
+ int offset;
+ /*
+ * If reg2 is a global variable, this means reg1 is
+ * an index into the variable's per-cpu array, so
+ * dereference type from reg2.
+ */
+ if (!dst->multi_regs || !has_reg_type(state, dst->reg2) ||
+ state->regs[dst->reg2].kind != TSR_KIND_GLOBAL_ADDR ||
+ !state->regs[dst->reg2].ok)
+ return;
+
+ addr = state->regs[dst->reg2].addr;
+ if (!get_global_var_type(cu_die, dloc, ip, addr, &offset,
+ &type_die) ||
+ !die_get_member_type(&type_die, offset, &type_die)) {
+ tsr->ok = false;
+ return;
+ }
+
+ tsr->type = type_die;
+ tsr->kind = TSR_KIND_TYPE;
+ tsr->offset = offset;
+ tsr->addr = 0;
+ tsr->ok = true;
+
+ pr_debug_dtp("ldr [%x] percpu (reg%d, reg%d) -> reg%d",
+ insn_offset, dreg, dst->reg2, sreg);
+ pr_debug_type_name(&tsr->type, tsr->kind);
}
return;
}
diff --git a/tools/perf/util/annotate-data.c b/tools/perf/util/annotate-data.c
index b75d50b2c46f..7161417d1c76 100644
--- a/tools/perf/util/annotate-data.c
+++ b/tools/perf/util/annotate-data.c
@@ -1230,20 +1230,37 @@ static enum type_match_result check_matching_type(struct type_state *state,
}
if (state->regs[reg].kind == TSR_KIND_PERCPU_BASE) {
- u64 var_addr = dloc->op->offset;
+ u64 var_addr;
int var_offset;
pr_debug_dtp("percpu var");
- if (dloc->op->multi_regs) {
- int reg2 = dloc->op->reg2;
+ if (arch__is_arm64(dloc->arch)) {
+ int reg2;
- if (dloc->op->reg2 == reg)
- reg2 = dloc->op->reg1;
+ if (!dloc->op->multi_regs)
+ return PERF_TMR_BAIL_OUT;
- if (has_reg_type(state, reg2) && state->regs[reg2].ok &&
- state->regs[reg2].kind == TSR_KIND_CONST)
- var_addr += state->regs[reg2].imm_value;
+ reg2 = dloc->op->reg2;
+ if (!has_reg_type(state, reg2) ||
+ state->regs[reg2].kind != TSR_KIND_GLOBAL_ADDR ||
+ !state->regs[reg2].ok)
+ return PERF_TMR_BAIL_OUT;
+
+ var_addr = state->regs[reg2].addr;
+ } else {
+ var_addr = dloc->op->offset;
+
+ if (dloc->op->multi_regs) {
+ int reg2 = dloc->op->reg2;
+
+ if (dloc->op->reg2 == reg)
+ reg2 = dloc->op->reg1;
+
+ if (has_reg_type(state, reg2) && state->regs[reg2].ok &&
+ state->regs[reg2].kind == TSR_KIND_CONST)
+ var_addr += state->regs[reg2].imm_value;
+ }
}
if (get_global_var_type(cu_die, dloc, dloc->ip, var_addr,
--
2.34.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v2 16/16] perf annotate-arm64: Support 'mrs' instruction to track 'current' pointer
2026-04-03 9:47 [PATCH v2 00/16] perf arm64: Support data type profiling Tengda Wu
` (14 preceding siblings ...)
2026-04-03 9:47 ` [PATCH v2 15/16] perf annotate-arm64: Support per-cpu variable access tracking Tengda Wu
@ 2026-04-03 9:48 ` Tengda Wu
2026-04-07 6:31 ` [PATCH v2 00/16] perf arm64: Support data type profiling Namhyung Kim
16 siblings, 0 replies; 22+ messages in thread
From: Tengda Wu @ 2026-04-03 9:48 UTC (permalink / raw)
To: Peter Zijlstra, Namhyung Kim, leo.yan, Li Huafei, Ian Rogers,
Kim Phillips, Mark Rutland, Arnaldo Carvalho de Melo, Ingo Molnar
Cc: Bill Wendling, Nick Desaulniers, Alexander Shishkin,
Adrian Hunter, Zecheng Li, linux-perf-users, linux-kernel, llvm,
Tengda Wu
Extend update_insn_state() for arm64 to handle the 'mrs' instruction,
enabling the tracking of the 'current' task pointer in the kernel.
On arm64, the kernel uses the 'sp_el0' system register to store the
address of the currently executing 'struct task_struct'. This is
typically accessed via the 'get_current()' inline function, resulting
in the instruction 'mrs xN, sp_el0'.
To resolve the data type of the target register, first verify the
access is to 'sp_el0' within a kernel DSO. Then, locate the
'get_current()' inline function's DWARF Die at the current PC and
extract its return type (which is 'struct task_struct *').
Introduce a global 'task_struct_off' cache to store the DWARF offset
of this type. This is particularly important because the compiler-generated
stack canary check code (which loads from 'current') often exists in
code sections or leaf functions where the local Compilation Unit (CU)
lacks a full 'struct task_struct' definition. Caching the offset allows
'perf annotate' to consistently resolve task-related fields across the
entire kernel binary.
A real-world example is shown below:
ffff8000800deee8 <kthread_blkcg>:
ffff8000800deef0: mrs x0, sp_el0 // x0 = current
ffff8000800deef4: ldr w1, [x0, #44] // Access task_struct member
Before this commit, the type flow starts with no information:
chk [c] reg0 offset=0x2c ok=0 kind=0 cfa : no type information
final result: no type information
After this commit, the tracker identifies the 'current' pointer
from the system register:
mrs [8] sp_el0 -> reg0 type='struct task_struct*'
chk [c] reg0 offset=0x2c ok=1 kind=1 (struct task_struct*) : Good!
found by insn track: 0x2c(reg0) type-offset=0x2c
final result: type='struct task_struct'
Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
---
.../perf/util/annotate-arch/annotate-arm64.c | 53 +++++++++++++++++++
1 file changed, 53 insertions(+)
diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
index 89b6b596f984..b03b12594260 100644
--- a/tools/perf/util/annotate-arch/annotate-arm64.c
+++ b/tools/perf/util/annotate-arch/annotate-arm64.c
@@ -14,6 +14,7 @@
#include "../debug.h"
#include "../map.h"
#include "../symbol.h"
+#include "../dso.h"
struct arch_arm64 {
struct arch arch;
@@ -289,6 +290,8 @@ static void adjust_reg_index_state(struct type_state *state, int reg,
pr_debug_type_name(&tsr->type, tsr->kind);
}
+static Dwarf_Off task_struct_off;
+
static void update_insn_state_arm64(struct type_state *state,
struct data_loc_info *dloc, Dwarf_Die *cu_die,
struct disasm_line *dl)
@@ -309,6 +312,56 @@ static void update_insn_state_arm64(struct type_state *state,
sreg = src->reg1;
dreg = dst->reg1;
+ if (!strcmp(dl->ins.name, "mrs")) {
+ Dwarf_Die func_die;
+ Dwarf_Attribute attr;
+ u64 ip, pc;
+
+ if (!has_reg_type(state, sreg))
+ return;
+
+ /* Handle case difference: LLVM (SP_EL0) vs objdump (sp_el0) */
+ if (!dso__kernel(map__dso(dloc->ms->map)) ||
+ strcasecmp(dl->ops.target.raw, "sp_el0"))
+ return;
+
+ ip = dloc->ms->sym->start + dl->al.offset;
+ pc = map__rip_2objdump(dloc->ms->map, ip);
+
+ if (!task_struct_off ||
+ !dwarf_offdie(dloc->di->dbg, task_struct_off, &type_die)) {
+ /*
+ * Find the inline function 'get_current()' Dwarf_Die
+ * and obtain its return value data type, which should
+ * be 'struct task_struct *'.
+ */
+ if (!die_find_inlinefunc(cu_die, pc, &func_die) ||
+ !dwarf_attr_integrate(&func_die, DW_AT_type, &attr) ||
+ !dwarf_formref_die(&attr, &type_die))
+ return;
+
+ /*
+ * Cache the 'struct task_struct *' die offset globally.
+ * This allows us to resolve stack canary accesses even
+ * in CUs that lack a full task_struct definition (e.g.,
+ * compiler-generated entry/exit code).
+ */
+ task_struct_off = dwarf_dieoffset(&type_die);
+ }
+
+ tsr = &state->regs[sreg];
+ tsr->copied_from = -1;
+ tsr->type = type_die;
+ tsr->kind = TSR_KIND_TYPE;
+ tsr->offset = 0;
+ tsr->addr = 0;
+ tsr->ok = true;
+
+ pr_debug_dtp("mrs [%x] sp_el0 -> reg%d", insn_offset, sreg);
+ pr_debug_type_name(&type_die, tsr->kind);
+ return;
+ }
+
if (!strcmp(dl->ins.name, "adrp")) {
if (!has_reg_type(state, sreg) || !dl->ops.target.addr)
return;
--
2.34.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v2 00/16] perf arm64: Support data type profiling
2026-04-03 9:47 [PATCH v2 00/16] perf arm64: Support data type profiling Tengda Wu
` (15 preceding siblings ...)
2026-04-03 9:48 ` [PATCH v2 16/16] perf annotate-arm64: Support 'mrs' instruction to track 'current' pointer Tengda Wu
@ 2026-04-07 6:31 ` Namhyung Kim
16 siblings, 0 replies; 22+ messages in thread
From: Namhyung Kim @ 2026-04-07 6:31 UTC (permalink / raw)
To: Tengda Wu
Cc: Peter Zijlstra, leo.yan, Li Huafei, Ian Rogers, Kim Phillips,
Mark Rutland, Arnaldo Carvalho de Melo, Ingo Molnar,
Bill Wendling, Nick Desaulniers, Alexander Shishkin,
Adrian Hunter, Zecheng Li, linux-perf-users, linux-kernel, llvm
Hello,
On Fri, Apr 03, 2026 at 09:47:44AM +0000, Tengda Wu wrote:
> This patch series implements data type profiling support for arm64,
> building upon the foundational work previously contributed by Huafei [1].
> While the initial version laid the groundwork for arm64 data type analysis,
> this series iterates on that work by refining instruction parsing and
> extending support for core architectural features.
Thanks for working on this! I'm happy to see that the changes are well
organized and each commit explained the issues clearly.
>
> The series is organized as follows:
>
> 1. Fix disassembly mismatches (Patches 01-02)
> Current perf annotate supports three disassembly backends: llvm,
> capstone, and objdump. On arm64, inconsistencies between the output
> of these backends (specifically llvm/capstone vs. objdump) often
> prevent the tracker from correctly identifying registers and offsets.
> These patches resolve these mismatches, ensuring consistent instruction
> parsing across all supported backends.
>
> 2. Infrastructure for arm64 operand parsing (Patches 03-07)
> These patches establish the necessary infrastructure for arm64-specific
> operand handling. This includes implementing new callbacks and data
> structures to manage arm64's unique addressing modes and register sets.
> This foundation is essential for the subsequent type-tracking logic.
I've only checked up to this part so far. Let me write replies soon.
I'll continue to review later in this week.
>
> 3. Core instruction tracking (Patches 08-16)
> These patches implement the core logic for type tracking on arm64,
> covering a wide range of instructions including:
>
> * Memory Access: ldr/str variants (including stack-based access).
> * Arithmetic & Data Processing: mov, add, and adrp.
> * Special Access: System register access (mrs) and per-cpu variable
> tracking.
>
> The implementation draws inspiration from the existing x86 logic while
> adapting it to the nuances of the AArch64 ISA [2][3]. With these changes,
> perf annotate can successfully resolve memory locations and register
> types, enabling comprehensive data type profiling on arm64 platforms.
>
> Example Result
> ==============
>
> # perf mem record -a -K -- sleep 1
> # perf annotate --data-type --type-stat --stdio
> Annotate data type stats:
> total 6204, ok 5091 (82.1%), bad 1113 (17.9%)
I'm impressed that the success rate is quite high. But I think you need
to confirm that the findings are correct by taking a close look at each
result. You can try `perf annotate --code-with-type`.
Thanks,
Namhyung
> -----------------------------------------------------------
> 29 : no_sym
> 196 : no_var
> 806 : no_typeinfo
> 82 : bad_offset
> 1370 : insn_track
>
> Annotate type: 'struct page' in [kernel.kallsyms] (59208 samples):
> ============================================================================
> Percent offset size field
> 100.00 0 0x40 struct page {
> 9.95 0 0x8 long unsigned int flags;
> 52.83 0x8 0x28 union {
> 52.83 0x8 0x28 struct {
> 37.21 0x8 0x10 union {
> 37.21 0x8 0x10 struct list_head lru {
> 37.21 0x8 0x8 struct list_head* next;
> 0.00 0x10 0x8 struct list_head* prev;
> };
> 37.21 0x8 0x10 struct {
> 37.21 0x8 0x8 void* __filler;
> 0.00 0x10 0x4 unsigned int mlock_count;
> ...
>
> Changes since v1: (reworked from Huafei's series):
>
> - Fix inconsistencies in arm64 instruction output across llvm, capstone,
> and objdump disassembly backends.
> - Support arm64-specific addressing modes and operand formats. (Leo Yan)
> - Extend instruction tracking to support mov and add instructions,
> along with per-cpu and stack variables.
> - Include real-world examples in commit messages to demonstrate
> practical effects. (Namhyung Kim)
> - Improve type-tracking success rate (type stat) from 64.2% to 82.1%.
> https://lore.kernel.org/all/20250314162137.528204-1-lihuafei1@huawei.com/
>
> Please let me know if you have any feedback.
>
> Thanks,
> Tengda
>
> [1] https://lore.kernel.org/all/20250314162137.528204-1-lihuafei1@huawei.com/
> [2] https://developer.arm.com/documentation/102374/0103
> [3] https://github.com/flynd/asmsheets/releases/tag/v8
>
> ---
>
> Tengda Wu (16):
> perf llvm: Fix arm64 adrp instruction disassembly mismatch with
> objdump
> perf capstone: Fix arm64 jump/adrp disassembly mismatch with objdump
> perf annotate-arm64: Generalize arm64_mov__parse to support standard
> operands
> perf annotate-arm64: Handle load and store instructions
> perf annotate: Introduce extract_op_location callback for
> arch-specific parsing
> perf dwarf-regs: Adapt get_dwarf_regnum() for arm64
> perf annotate-arm64: Implement extract_op_location() callback
> perf annotate-arm64: Enable instruction tracking support
> perf annotate-arm64: Support load instruction tracking
> perf annotate-arm64: Support store instruction tracking
> perf annotate-arm64: Support stack variable tracking
> perf annotate-arm64: Support 'mov' instruction tracking
> perf annotate-arm64: Support 'add' instruction tracking
> perf annotate-arm64: Support 'adrp' instruction to track global
> variables
> perf annotate-arm64: Support per-cpu variable access tracking
> perf annotate-arm64: Support 'mrs' instruction to track 'current'
> pointer
>
> .../perf/util/annotate-arch/annotate-arm64.c | 642 +++++++++++++++++-
> .../util/annotate-arch/annotate-powerpc.c | 10 +
> tools/perf/util/annotate-arch/annotate-x86.c | 88 ++-
> tools/perf/util/annotate-data.c | 72 +-
> tools/perf/util/annotate-data.h | 7 +-
> tools/perf/util/annotate.c | 108 +--
> tools/perf/util/annotate.h | 12 +
> tools/perf/util/capstone.c | 107 ++-
> tools/perf/util/disasm.c | 5 +
> tools/perf/util/disasm.h | 5 +
> .../util/dwarf-regs-arch/dwarf-regs-arm64.c | 20 +
> tools/perf/util/dwarf-regs.c | 2 +-
> tools/perf/util/include/dwarf-regs.h | 1 +
> tools/perf/util/llvm.c | 50 ++
> 14 files changed, 984 insertions(+), 145 deletions(-)
>
>
> base-commit: cf7c3c02fdd0dfccf4d6611714273dcb538af2cb
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 02/16] perf capstone: Fix arm64 jump/adrp disassembly mismatch with objdump
2026-04-03 9:47 ` [PATCH v2 02/16] perf capstone: Fix arm64 jump/adrp " Tengda Wu
@ 2026-04-07 6:43 ` Namhyung Kim
0 siblings, 0 replies; 22+ messages in thread
From: Namhyung Kim @ 2026-04-07 6:43 UTC (permalink / raw)
To: Tengda Wu
Cc: Peter Zijlstra, leo.yan, Li Huafei, Ian Rogers, Kim Phillips,
Mark Rutland, Arnaldo Carvalho de Melo, Ingo Molnar,
Bill Wendling, Nick Desaulniers, Alexander Shishkin,
Adrian Hunter, Zecheng Li, linux-perf-users, linux-kernel, llvm
On Fri, Apr 03, 2026 at 09:47:46AM +0000, Tengda Wu wrote:
> The jump and adrp instructions parsed by libcapstone currently lack
> symbolic representation and use a '#' prefix for addresses. This
> format is inconsistent with objdump's output, which causes subsequent
> parsing in jump__parse() and arm64_mov__parse() to fail.
>
> Example mismatch:
> Current: b #0xffff8000800114c8
> Fix: b ffff8000800114c8 <el0t_64_sync+0x108>
>
> Current: adrp x18, #0xffff800081f5f000
> Fix: adrp x18, ffff800081f5f000 <this_cpu_vector>
>
> Fix this by implementing extended formatting for these arm64
> instructions during symbol__disassemble_capstone(). This ensures
> the output matches objdump's expected style, including the raw
> address and the associated <symbol+offset> suffix.
>
> Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
> ---
> tools/perf/util/capstone.c | 107 ++++++++++++++++++++++++++++++++-----
> tools/perf/util/disasm.c | 5 ++
> tools/perf/util/disasm.h | 1 +
> 3 files changed, 101 insertions(+), 12 deletions(-)
>
> diff --git a/tools/perf/util/capstone.c b/tools/perf/util/capstone.c
> index 25cf6e15ec27..1d8421d2d98c 100644
> --- a/tools/perf/util/capstone.c
> +++ b/tools/perf/util/capstone.c
> @@ -255,10 +255,6 @@ static void print_capstone_detail(struct cs_insn *insn, char *buf, size_t len,
> struct map *map = args->ms->map;
> struct symbol *sym;
>
> - /* TODO: support more architectures */
> - if (!arch__is_x86(args->arch))
> - return;
> -
> if (insn->detail == NULL)
> return;
>
> @@ -305,6 +301,98 @@ static void print_capstone_detail(struct cs_insn *insn, char *buf, size_t len,
> }
> }
>
> +static void format_capstone_insn_x86(struct cs_insn *insn, char *buf,
> + size_t len, struct annotate_args *args,
> + u64 addr)
> +{
> + int printed;
> +
> + printed = scnprintf(buf, len, " %-7s %s",
> + insn->mnemonic, insn->op_str);
> + buf += printed;
> + len -= printed;
> +
> + print_capstone_detail(insn, buf, len, args, addr);
> +}
> +
> +static void format_capstone_insn_arm64(struct cs_insn *insn, char *buf,
> + size_t len, struct annotate_args *args,
> + u64 addr)
> +{
> + struct map *map = args->ms->map;
> + struct symbol *sym;
> + char *last_imm, *endptr;
> + u64 orig_addr;
> +
> + scnprintf(buf, len, " %-7s %s",
> + insn->mnemonic, insn->op_str);
> + /*
> + * Adjust instructions to keep the existing behavior with objdump.
> + *
> + * Example conversion:
> + * From: b #0xffff8000800114c8
> + * To: b ffff8000800114c8 <el0t_64_sync+0x108>
> + */
> + switch (insn->id) {
> + case ARM64_INS_B:
> + case ARM64_INS_BL:
> + case ARM64_INS_CBNZ:
> + case ARM64_INS_CBZ:
> + case ARM64_INS_TBNZ:
> + case ARM64_INS_TBZ:
> + case ARM64_INS_ADRP:
> + /* Extract last immediate value as address */
> + last_imm = strrchr(buf, '#');
> + if (!last_imm)
> + return;
> +
> + orig_addr = strtoull(last_imm + 1, &endptr, 16);
> + if (endptr == last_imm + 1)
> + return;
> +
> + /* Relocate map that contains the address */
> + if (dso__kernel(map__dso(map))) {
> + map = maps__find(map__kmaps(map), orig_addr);
> + if (map == NULL)
> + return;
I know you copied the logic from x86, but I've realized that it leaks a
refcount for the new kernel map returned from maps__find(). This needs
to be fixed separately.
Thanks,
Namhyung
> + }
> +
> + /* Convert it to map-relative address for search */
> + addr = map__map_ip(map, orig_addr);
> +
> + sym = map__find_symbol(map, addr);
> + if (sym == NULL)
> + return;
> +
> + /* Symbolize the resolved address */
> + len = len - (last_imm - buf);
> + if (addr == sym->start) {
> + scnprintf(last_imm, len, "%"PRIx64" <%s>",
> + orig_addr, sym->name);
> + } else {
> + scnprintf(last_imm, len, "%"PRIx64" <%s+%#"PRIx64">",
> + orig_addr, sym->name, addr - sym->start);
> + }
> + break;
> + default:
> + break;
> + }
> +}
> +
> +static void format_capstone_insn(struct cs_insn *insn, char *buf, size_t len,
> + struct annotate_args *args, u64 addr)
> +{
> + /* TODO: support more architectures */
> + if (arch__is_x86(args->arch))
> + format_capstone_insn_x86(insn, buf, len, args, addr);
> + else if (arch__is_arm64(args->arch))
> + format_capstone_insn_arm64(insn, buf, len, args, addr);
> + else {
> + scnprintf(buf, len, " %-7s %s",
> + insn->mnemonic, insn->op_str);
> + }
> +}
> +
> struct find_file_offset_data {
> u64 ip;
> u64 offset;
> @@ -381,14 +469,9 @@ int symbol__disassemble_capstone(const char *filename __maybe_unused,
>
> free_count = count = perf_cs_disasm(handle, buf, buf_len, start, buf_len, &insn);
> for (i = 0, offset = 0; i < count; i++) {
> - int printed;
> -
> - printed = scnprintf(disasm_buf, sizeof(disasm_buf),
> - " %-7s %s",
> - insn[i].mnemonic, insn[i].op_str);
> - print_capstone_detail(&insn[i], disasm_buf + printed,
> - sizeof(disasm_buf) - printed, args,
> - start + offset);
> + format_capstone_insn(&insn[i], disasm_buf,
> + sizeof(disasm_buf), args,
> + start + offset);
>
> args->offset = offset;
> args->line = disasm_buf;
> diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
> index 40fcaed5d0b1..988b2b748e11 100644
> --- a/tools/perf/util/disasm.c
> +++ b/tools/perf/util/disasm.c
> @@ -202,6 +202,11 @@ bool arch__is_powerpc(const struct arch *arch)
> return arch->id.e_machine == EM_PPC || arch->id.e_machine == EM_PPC64;
> }
>
> +bool arch__is_arm64(const struct arch *arch)
> +{
> + return arch->id.e_machine == EM_AARCH64;
> +}
> +
> static void ins_ops__delete(struct ins_operands *ops)
> {
> if (ops == NULL)
> diff --git a/tools/perf/util/disasm.h b/tools/perf/util/disasm.h
> index a6e478caf61a..d3730ed86dba 100644
> --- a/tools/perf/util/disasm.h
> +++ b/tools/perf/util/disasm.h
> @@ -111,6 +111,7 @@ struct annotate_args {
> const struct arch *arch__find(uint16_t e_machine, uint32_t e_flags, const char *cpuid);
> bool arch__is_x86(const struct arch *arch);
> bool arch__is_powerpc(const struct arch *arch);
> +bool arch__is_arm64(const struct arch *arch);
>
> extern const struct ins_ops call_ops;
> extern const struct ins_ops dec_ops;
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 03/16] perf annotate-arm64: Generalize arm64_mov__parse to support standard operands
2026-04-03 9:47 ` [PATCH v2 03/16] perf annotate-arm64: Generalize arm64_mov__parse to support standard operands Tengda Wu
@ 2026-04-07 6:58 ` Namhyung Kim
0 siblings, 0 replies; 22+ messages in thread
From: Namhyung Kim @ 2026-04-07 6:58 UTC (permalink / raw)
To: Tengda Wu
Cc: Peter Zijlstra, leo.yan, Li Huafei, Ian Rogers, Kim Phillips,
Mark Rutland, Arnaldo Carvalho de Melo, Ingo Molnar,
Bill Wendling, Nick Desaulniers, Alexander Shishkin,
Adrian Hunter, Zecheng Li, linux-perf-users, linux-kernel, llvm
On Fri, Apr 03, 2026 at 09:47:47AM +0000, Tengda Wu wrote:
> The current arm64_mov__parse() implementation strictly requires the
> operand to contain a symbol suffix in the "<symbol>" format. This
> causes the parser to fail for standard instructions that only contain
> raw immediates or registers without symbolic annotations.
>
> Refactor the function to make symbol matching optional. The parser now
> correctly extracts the target operand and only attempts to parse the
> "<symbol>" suffix if it exists. This change also introduces better
> handling for whitespace and comments, and adds support for multi-register
> check via arm64__check_multi_regs(), ensuring compatibility with a
> wider range of arm64 instruction formats.
>
> Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
> ---
> .../perf/util/annotate-arch/annotate-arm64.c | 85 ++++++++++++++-----
> 1 file changed, 65 insertions(+), 20 deletions(-)
>
> diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
> index 33080fdca125..4c42323b0c18 100644
> --- a/tools/perf/util/annotate-arch/annotate-arm64.c
> +++ b/tools/perf/util/annotate-arch/annotate-arm64.c
> @@ -14,12 +14,38 @@ struct arch_arm64 {
> regex_t jump_insn;
> };
>
> +static bool arm64__check_multi_regs(const char *op)
> +{
> + char *comma = strchr(op, ',');
> +
> + while (comma) {
> + char *next = comma + 1;
> +
> + next = skip_spaces(next);
> +
> + /*
> + * Check the first valid character after the comma:
> + * - If it is '#', it indicates an immediate offset (e.g., [x1, #16]).
> + * - If it is an alphabetic character, it is highly likely a
> + * register name (e.g., x, w, s, d, q, v, p, z).
> + * - Special cases: Alias and control registers like sp, xzr,
> + * and wzr all start with an alphabetic character.
> + */
> + if (*next && *next != '#' && isalpha(*next))
> + return true;
It seems you check any alphabet charactor after a comma for multi-regs.
Does that mean the first component before the comma should be another
register?
> +
> + comma = strchr(next, ',');
> + }
> +
> + return false;
> +}
> +
> static int arm64_mov__parse(const struct arch *arch __maybe_unused,
> struct ins_operands *ops,
> struct map_symbol *ms __maybe_unused,
> struct disasm_line *dl __maybe_unused)
> {
> - char *s = strchr(ops->raw, ','), *target, *endptr;
> + char *s = strchr(ops->raw, ','), *target, *endptr, *comment, prev;
>
> if (s == NULL)
> return -1;
> @@ -31,29 +57,48 @@ static int arm64_mov__parse(const struct arch *arch __maybe_unused,
> if (ops->source.raw == NULL)
> return -1;
>
> - target = ++s;
> + target = skip_spaces(++s);
> + comment = strchr(s, arch->objdump.comment_char);
> +
> + if (comment != NULL)
> + s = comment - 1;
> + else
> + s = strchr(s, '\0') - 1;
An interesting use of strchr(). Oh, I found the strchr(3) man page
also mentions that it's a supported use case. TIL.
> +
> + while (s > target && isspace(s[0]))
> + --s;
> + s++;
> + prev = *s;
> + *s = '\0';
> ops->target.raw = strdup(target);
> + *s = prev;
> +
> if (ops->target.raw == NULL)
> goto out_free_source;
>
> - ops->target.addr = strtoull(target, &endptr, 16);
> - if (endptr == target)
> - goto out_free_target;
> -
> - s = strchr(endptr, '<');
> - if (s == NULL)
> - goto out_free_target;
> - endptr = strchr(s + 1, '>');
> - if (endptr == NULL)
> - goto out_free_target;
> -
> - *endptr = '\0';
> - *s = ' ';
> - ops->target.name = strdup(s);
> - *s = '<';
> - *endptr = '>';
> - if (ops->target.name == NULL)
> - goto out_free_target;
> + ops->target.multi_regs = arm64__check_multi_regs(ops->target.raw);
> +
> + /* Parse address followed by symbol name, e.g. "addr <symbol>" */
> + if (strchr(target, '<') != NULL) {
> + ops->target.addr = strtoull(target, &endptr, 16);
> + if (endptr == target)
> + goto out_free_target;
Hmm.. shouldn't this part be executed regardless of presence of a symbol
name?
> +
> + s = strchr(endptr, '<');
> + if (s == NULL)
> + goto out_free_target;
It'd be safer to check `if (*skip_spaces(endptr) == '<')` rather than
strchr().
> + endptr = strchr(s + 1, '>');
> + if (endptr == NULL)
> + goto out_free_target;
I'm afraid C++ programs can have symbols with <> for templates.
Probably strrchr() would work.
Thanks,
Namhyung
> +
> + *endptr = '\0';
> + *s = ' ';
> + ops->target.name = strdup(s);
> + *s = '<';
> + *endptr = '>';
> + if (ops->target.name == NULL)
> + goto out_free_target;
> + }
>
> return 0;
>
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 04/16] perf annotate-arm64: Handle load and store instructions
2026-04-03 9:47 ` [PATCH v2 04/16] perf annotate-arm64: Handle load and store instructions Tengda Wu
@ 2026-04-07 7:09 ` Namhyung Kim
0 siblings, 0 replies; 22+ messages in thread
From: Namhyung Kim @ 2026-04-07 7:09 UTC (permalink / raw)
To: Tengda Wu
Cc: Peter Zijlstra, leo.yan, Li Huafei, Ian Rogers, Kim Phillips,
Mark Rutland, Arnaldo Carvalho de Melo, Ingo Molnar,
Bill Wendling, Nick Desaulniers, Alexander Shishkin,
Adrian Hunter, Zecheng Li, linux-perf-users, linux-kernel, llvm
On Fri, Apr 03, 2026 at 09:47:48AM +0000, Tengda Wu wrote:
> Add ldst_ops to handle load and store instructions in order to parse
> the data types and offsets associated with PMU events for memory access
> instructions. There are many variants of load and store instructions in
> ARM64, making it difficult to match all of these instruction names
> completely. Therefore, only the instruction prefixes are matched. The
> prefix 'ld|st' covers most of the memory access instructions, 'cas|swp'
> matches atomic instructions, and 'prf' matches memory prefetch
> instructions.
>
> Signed-off-by: Li Huafei <lihuafei1@huawei.com>
> Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
> ---
> .../perf/util/annotate-arch/annotate-arm64.c | 72 +++++++++++++++++++
> 1 file changed, 72 insertions(+)
>
> diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
> index 4c42323b0c18..8209faaa6086 100644
> --- a/tools/perf/util/annotate-arch/annotate-arm64.c
> +++ b/tools/perf/util/annotate-arch/annotate-arm64.c
> @@ -3,7 +3,9 @@
> #include <errno.h>
> #include <stdlib.h>
> #include <string.h>
> +#include <ctype.h>
> #include <linux/zalloc.h>
> +#include <linux/string.h>
> #include <regex.h>
> #include "../annotate.h"
> #include "../disasm.h"
> @@ -12,6 +14,7 @@ struct arch_arm64 {
> struct arch arch;
> regex_t call_insn;
> regex_t jump_insn;
> + regex_t ldst_insn; /* load and store instruction */
> };
>
> static bool arm64__check_multi_regs(const char *op)
> @@ -114,6 +117,59 @@ static const struct ins_ops arm64_mov_ops = {
> .scnprintf = mov__scnprintf,
> };
>
> +static int arm64_ldst__parse(const struct arch *arch __maybe_unused,
The 'arch' is used. :)
> + struct ins_operands *ops,
> + struct map_symbol *ms __maybe_unused,
> + struct disasm_line *dl __maybe_unused)
> +{
> + char *s, *target;
> +
> + /*
> + * The part starting from the memory access annotation '[' is parsed
> + * as 'target', while the part before it is parsed as 'source'.
It'd be nice if you can show some examples. So it always looks like:
LDR x1, [x2, #4]
STR x1, [x3], #8
Right? What about other instructions?
> + */
> + target = s = strchr(ops->raw, arch->objdump.memory_ref_char);
> + if (!s)
> + return -1;
> +
> + while (s > ops->raw && *s != ',')
> + --s;
> +
> + if (s == ops->raw)
> + return -1;
> +
> + *s = '\0';
> + ops->source.raw = strdup(ops->raw);
> +
> + *s = ',';
> + if (!ops->source.raw)
> + return -1;
> +
> + ops->source.multi_regs = arm64__check_multi_regs(ops->source.raw);
Probably it's better to set ops->source.mem_ref to false. Then you
won't need to set multi_regs.
Thanks,
Namhyung
> +
> + ops->target.raw = strdup(target);
> + if (!ops->target.raw) {
> + zfree(&ops->source.raw);
> + return -1;
> + }
> + ops->target.mem_ref = true;
> + ops->target.multi_regs = arm64__check_multi_regs(ops->target.raw);
> +
> + return 0;
> +}
> +
> +static int arm64_ldst__scnprintf(const struct ins *ins, char *bf, size_t size,
> + struct ins_operands *ops, int max_ins_name)
> +{
> + return scnprintf(bf, size, "%-*s %s,%s", max_ins_name, ins->name,
> + ops->source.raw, ops->target.raw);
> +}
> +
> +static struct ins_ops arm64_ldst_ops = {
> + .parse = arm64_ldst__parse,
> + .scnprintf = arm64_ldst__scnprintf,
> +};
> +
> static const struct ins_ops *arm64__associate_instruction_ops(struct arch *arch, const char *name)
> {
> struct arch_arm64 *arm = container_of(arch, struct arch_arm64, arch);
> @@ -124,6 +180,8 @@ static const struct ins_ops *arm64__associate_instruction_ops(struct arch *arch,
> ops = &jump_ops;
> else if (!regexec(&arm->call_insn, name, 2, match, 0))
> ops = &call_ops;
> + else if (!regexec(&arm->ldst_insn, name, 2, match, 0))
> + ops = &arm64_ldst_ops;
> else if (!strcmp(name, "ret"))
> ops = &ret_ops;
> else
> @@ -148,6 +206,8 @@ const struct arch *arch__new_arm64(const struct e_machine_and_e_flags *id,
> arch->id = *id;
> arch->objdump.comment_char = '/';
> arch->objdump.skip_functions_char = '+';
> + arch->objdump.memory_ref_char = '[';
> + arch->objdump.imm_char = '#';
> arch->associate_instruction_ops = arm64__associate_instruction_ops;
>
> /* bl, blr */
> @@ -161,8 +221,20 @@ const struct arch *arch__new_arm64(const struct e_machine_and_e_flags *id,
> if (err)
> goto out_free_call;
>
> + /*
> + * The ARM64 architecture has many variants of load/store instructions.
> + * It is quite challenging to match all of them completely. Here, we
> + * only match the prefixes of these instructions.
> + */
> + err = regcomp(&arm->ldst_insn, "^(ld|st|cas|prf|swp)",
> + REG_EXTENDED);
> + if (err)
> + goto out_free_jump;
> +
> return arch;
>
> +out_free_jump:
> + regfree(&arm->jump_insn);
> out_free_call:
> regfree(&arm->call_insn);
> out_free_arm:
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 07/16] perf annotate-arm64: Implement extract_op_location() callback
2026-04-03 9:47 ` [PATCH v2 07/16] perf annotate-arm64: Implement extract_op_location() callback Tengda Wu
@ 2026-04-07 7:26 ` Namhyung Kim
0 siblings, 0 replies; 22+ messages in thread
From: Namhyung Kim @ 2026-04-07 7:26 UTC (permalink / raw)
To: Tengda Wu
Cc: Peter Zijlstra, leo.yan, Li Huafei, Ian Rogers, Kim Phillips,
Mark Rutland, Arnaldo Carvalho de Melo, Ingo Molnar,
Bill Wendling, Nick Desaulniers, Alexander Shishkin,
Adrian Hunter, Zecheng Li, linux-perf-users, linux-kernel, llvm
On Fri, Apr 03, 2026 at 09:47:51AM +0000, Tengda Wu wrote:
> Implement the extract_op_location() callback for the arm64 architecture
> to handle its specific assembly syntax and addressing modes.
>
> The extractor handles:
> 1. Standalone immediate operands (e.g., #0x10).
> 2. Memory references with diverse addressing modes:
> - Signed offset: [base, #imm]
> - Pre-index: [base, #imm]!
> - Post-index: [base], #imm
> 3. Multi-register operands and primary/secondary register extraction.
>
> This enables 'perf annotate' to resolve memory locations and registers
> required for data type profiling on arm64.
>
> Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
> ---
> .../perf/util/annotate-arch/annotate-arm64.c | 64 +++++++++++++++++++
> tools/perf/util/annotate.c | 12 ++--
> tools/perf/util/annotate.h | 10 +++
> 3 files changed, 81 insertions(+), 5 deletions(-)
>
> diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
> index 8209faaa6086..1fe4c503431b 100644
> --- a/tools/perf/util/annotate-arch/annotate-arm64.c
> +++ b/tools/perf/util/annotate-arch/annotate-arm64.c
> @@ -191,6 +191,69 @@ static const struct ins_ops *arm64__associate_instruction_ops(struct arch *arch,
> return ops;
> }
>
> +static int extract_op_location_arm64(const struct arch *arch,
> + struct disasm_line *dl __maybe_unused,
> + const char *op_str, int op_idx __maybe_unused,
> + struct annotated_op_loc *op_loc)
> +{
> + const char *s = op_str;
> + char *p = NULL;
> +
> + if (op_str == NULL)
> + return 0;
> +
> + /* Handle standalone immediate operands (e.g., #0x10) */
> + if (*s == arch->objdump.imm_char) {
> + op_loc->offset = strtol(s + 1, &p, 0);
> + if (p && p != s + 1)
> + op_loc->imm = true;
> + return 0;
> + }
> +
> + /*
> + * Handle memory references (e.g., [x0, #8]), identify
> + * arm64 specific addressing modes
> + */
> + if (*s == arch->objdump.memory_ref_char) {
> + op_loc->mem_ref = true;
> +
> + p = strchr(s, ']');
> + if (p == NULL)
> + return -1;
> +
> + /* Pre-index: [base, #imm]! */
> + if (p[1] == '!')
> + op_loc->addr_mode = INSN_ADDR_PRE_INDEX;
> + /* Post-index: [base], #imm */
> + else if (p[1] == ',' && strchr(p + 1, arch->objdump.imm_char))
> + op_loc->addr_mode = INSN_ADDR_POST_INDEX;
> + /* Signed offset: [base{, #imm}] */
> + else
> + op_loc->addr_mode = INSN_ADDR_SIGNED_OFFSET;
> +
> + s++;
> + }
> +
> + /* Extract the primary register */
> + op_loc->reg1 = arch__dwarf_regnum(arch, s);
> + if (op_loc->reg1 == -1)
> + return -1;
> +
> + /* Move to the next symbol of the operand, if any */
> + s = strchr(s, ',');
> + if (s == NULL)
> + return 0;
> + s = skip_spaces(s + 1);
> +
> + /* Parse secondary register or immediate offset */
> + if (op_loc->multi_regs)
> + op_loc->reg2 = arch__dwarf_regnum(arch, s);
> + else if (*s == arch->objdump.imm_char)
> + op_loc->offset = strtol(s + 1, &p, 0);
> +
> + return 0;
> +}
> +
> const struct arch *arch__new_arm64(const struct e_machine_and_e_flags *id,
> const char *cpuid __maybe_unused)
> {
> @@ -209,6 +272,7 @@ const struct arch *arch__new_arm64(const struct e_machine_and_e_flags *id,
> arch->objdump.memory_ref_char = '[';
> arch->objdump.imm_char = '#';
> arch->associate_instruction_ops = arm64__associate_instruction_ops;
> + arch->extract_op_location = extract_op_location_arm64;
>
> /* bl, blr */
> err = regcomp(&arm->call_insn, "^blr?$", REG_EXTENDED);
> diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
> index 1bf69e00d76d..c4d1cb3a7ae4 100644
> --- a/tools/perf/util/annotate.c
> +++ b/tools/perf/util/annotate.c
> @@ -2452,19 +2452,21 @@ int annotate_check_args(void)
>
> int arch__dwarf_regnum(const struct arch *arch, const char *str)
> {
> - const char *p;
> + const char *p = str;
> char *regname, *q;
> int reg;
>
> - p = strchr(str, arch->objdump.register_char);
> - if (p == NULL)
> - return -1;
> + if (arch->objdump.register_char) {
> + p = strchr(str, arch->objdump.register_char);
> + if (p == NULL)
> + return -1;
> + }
>
> regname = strdup(p);
> if (regname == NULL)
> return -1;
>
> - q = strpbrk(regname, ",) ");
> + q = strpbrk(regname, ",)] ");
> if (q)
> *q = '\0';
>
I think it's better to split changes in this function into a separate
commit earlier in the series. Please add description about arm64 asm
format about register prefix and different memory reference delimiters.
Probably you want to merge the change to remove 'static' scope here.
> diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
> index 71195a27d38f..0391c6a9f011 100644
> --- a/tools/perf/util/annotate.h
> +++ b/tools/perf/util/annotate.h
> @@ -496,12 +496,21 @@ int annotate_check_args(void);
>
> int arch__dwarf_regnum(const struct arch *arch, const char *str);
>
> +enum annotated_addr_mode {
> + INSN_ADDR_NONE = 0,
> +
> + INSN_ADDR_SIGNED_OFFSET,
> + INSN_ADDR_PRE_INDEX,
> + INSN_ADDR_POST_INDEX,
Probably better to have "PERF" prefix. Maybe PERF_AAM_SIGNED_OFFSET
(AAM from annotated_addr_mode) or PERF_ADDR_MODE_SIGNED_OFFSET?
Thanks,
Namhyung
> +};
> +
> /**
> * struct annotated_op_loc - Location info of instruction operand
> * @reg1: First register in the operand
> * @reg2: Second register in the operand
> * @offset: Memory access offset in the operand
> * @segment: Segment selector register
> + * @addr_mode: Addressing mode, only valid if @mem_ref is true
> * @mem_ref: Whether the operand accesses memory
> * @multi_regs: Whether the second register is used
> * @imm: Whether the operand is an immediate value (in offset)
> @@ -511,6 +520,7 @@ struct annotated_op_loc {
> int reg2;
> int offset;
> u8 segment;
> + u8 addr_mode;
> bool mem_ref;
> bool multi_regs;
> bool imm;
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2026-04-07 7:26 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-03 9:47 [PATCH v2 00/16] perf arm64: Support data type profiling Tengda Wu
2026-04-03 9:47 ` [PATCH v2 01/16] perf llvm: Fix arm64 adrp instruction disassembly mismatch with objdump Tengda Wu
2026-04-03 9:47 ` [PATCH v2 02/16] perf capstone: Fix arm64 jump/adrp " Tengda Wu
2026-04-07 6:43 ` Namhyung Kim
2026-04-03 9:47 ` [PATCH v2 03/16] perf annotate-arm64: Generalize arm64_mov__parse to support standard operands Tengda Wu
2026-04-07 6:58 ` Namhyung Kim
2026-04-03 9:47 ` [PATCH v2 04/16] perf annotate-arm64: Handle load and store instructions Tengda Wu
2026-04-07 7:09 ` Namhyung Kim
2026-04-03 9:47 ` [PATCH v2 05/16] perf annotate: Introduce extract_op_location callback for arch-specific parsing Tengda Wu
2026-04-03 9:47 ` [PATCH v2 06/16] perf dwarf-regs: Adapt get_dwarf_regnum() for arm64 Tengda Wu
2026-04-03 9:47 ` [PATCH v2 07/16] perf annotate-arm64: Implement extract_op_location() callback Tengda Wu
2026-04-07 7:26 ` Namhyung Kim
2026-04-03 9:47 ` [PATCH v2 08/16] perf annotate-arm64: Enable instruction tracking support Tengda Wu
2026-04-03 9:47 ` [PATCH v2 09/16] perf annotate-arm64: Support load instruction tracking Tengda Wu
2026-04-03 9:47 ` [PATCH v2 10/16] perf annotate-arm64: Support store " Tengda Wu
2026-04-03 9:47 ` [PATCH v2 11/16] perf annotate-arm64: Support stack variable tracking Tengda Wu
2026-04-03 9:47 ` [PATCH v2 12/16] perf annotate-arm64: Support 'mov' instruction tracking Tengda Wu
2026-04-03 9:47 ` [PATCH v2 13/16] perf annotate-arm64: Support 'add' " Tengda Wu
2026-04-03 9:47 ` [PATCH v2 14/16] perf annotate-arm64: Support 'adrp' instruction to track global variables Tengda Wu
2026-04-03 9:47 ` [PATCH v2 15/16] perf annotate-arm64: Support per-cpu variable access tracking Tengda Wu
2026-04-03 9:48 ` [PATCH v2 16/16] perf annotate-arm64: Support 'mrs' instruction to track 'current' pointer Tengda Wu
2026-04-07 6:31 ` [PATCH v2 00/16] perf arm64: Support data type profiling Namhyung Kim
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox