Linux Perf Users
 help / color / mirror / Atom feed
* [PATCH v3 00/21] perf arm64: Support data type profiling
@ 2026-07-01  3:53 Tengda Wu
  2026-07-01  3:53 ` [PATCH v3 01/21] perf capstone: Fix kernel map reference count leak Tengda Wu
                   ` (20 more replies)
  0 siblings, 21 replies; 52+ messages in thread
From: Tengda Wu @ 2026-07-01  3:53 UTC (permalink / raw)
  To: Namhyung Kim, james.clark, xueshuai, Li Huafei
  Cc: Peter Zijlstra, leo.yan, Ian Rogers, Kim Phillips, Mark Rutland,
	Arnaldo Carvalho de Melo, Ingo Molnar, Bill Wendling,
	Nick Desaulniers, Alexander Shishkin, Adrian Hunter, Zecheng Li,
	linux-perf-users, linux-kernel, llvm, Tengda Wu

This patch series implements data type profiling support for arm64,
enabling 'perf annotate --data-type' to resolve memory locations and
variable types on arm64 platforms.

The main changes since v2 include:
v2: https://lore.kernel.org/all/20260403094800.1418825-1-wutengda@huaweicloud.com/

* Reworking operand parsing, where src/dst are now set based on the
  actual instruction definition rather than always treating the left
  operand as src and the right as dst (Namhyung Kim).

* Register state handling has been improved, with better support for
  TSR_KIND_CONST and proper invalidation of dst for unsupported
  instructions.

* Several bug fixes are included, such as fixing a refcount leak, a
  global type resolving error (James Clark), unconditional call to
  arch->extract_op_location() (sashiko [1]). --itrace=i1i is now enabled
  by default for ARM SPE data type profiling to avoid overlapping
  events (James Clark).

Patch organization
==================

The series is organized as follows:

1. Fix refcount leak in print_capstone_detail(). (Patches 01)

2. Fix disassembly mismatches (Patches 02-03)
   Current perf annotate supports three disassembly backends: llvm,
   capstone, and objdump. On arm64, inconsistencies between the output
   of these backends (specifically llvm/capstone vs. objdump) often
   prevent the tracker from correctly identifying registers and offsets.
   These patches resolve these mismatches, ensuring consistent instruction
   parsing across all supported backends.

3. Infrastructure for arm64 operand parsing (Patches 04-09)
   These patches establish the necessary infrastructure for arm64-specific
   operand handling. This includes implementing new callbacks and data
   structures to manage arm64's unique addressing modes and register sets.
   This foundation is essential for the subsequent type-tracking logic.

4. ARM SPE even handling (Patches 10-11)
   Patch 10 automatically deduplicates overlapping ARM SPE events (e.g.,
   l1d-miss, tlb-access) in 'perf annotate' by retaining only the
   "instructions" event when data type profiling is enabled. Patch 11
   defaults the itrace period to 1 for PERF_ITRACE_PERIOD_INSTRUCTIONS
   to fix zero 'Percent' values in annotate output.

5. Core instruction tracking (Patches 12-21)
   These patches implement the core logic for type tracking on arm64,
   covering several key types of instructions, including:
   * Memory Access: ldr/str variants (including stack-based access).
   * Arithmetic & Data Processing: mov, add, and adrp.
   * Special Access: System register access (mrs) and per-cpu variable
     tracking.

The implementation draws inspiration from the existing x86 logic while
adapting it to the nuances of the AArch64 ISA [2][3]. With these changes,
perf annotate can successfully resolve memory locations and register types,
providing basic support for data type profiling on arm64 platforms.

Example Result
==============

# perf mem record -a -K -- sleep 1
# perf annotate --data-type --stdio --type-stat
Annotate data type stats:
total 1138, ok 876 (77.0%), bad 262 (23.0%)
-----------------------------------------------------------
         6 : no_sym
        42 : no_var
       206 : no_typeinfo
         8 : bad_offset
       237 : insn_track

Annotate type: 'struct page' in [kernel.kallsyms] (66948 samples):
============================================================================
 Percent     offset       size  field
  100.00          0       0x40  struct page      {
    9.01          0        0x8      long unsigned int   flags;
   57.99        0x8       0x28      union        {
   57.99        0x8       0x28          struct   {
   33.00        0x8       0x10              union        {
   33.00        0x8       0x10                  struct list_head        lru {
   33.00        0x8        0x8                      struct list_head*   next;
    0.00       0x10        0x8                      struct list_head*   prev;
                                                };
   33.00        0x8       0x10                  struct   {
   33.00        0x8        0x8                      void*       __filler;
    0.00       0x10        0x4                      unsigned int        mlock_count;
   ...

Each patch's type profiling results are as follows:

Patch | no_sym | no_var | no_typeinfo | bad_offset | insn_track | ok(%)
------+--------+--------+-------------+------------+------------+------
0010  | 6      | 493    | -           | -          | -          | 56.2%
0013  | 6      | 42     | 438         | 2          | 11         | 57.1%
0014  | 6      | 42     | 399         | 1          | 51         | 60.6%
0015  | 6      | 42     | 398         | 1          | 52         | 60.7%
0016  | 6      | 42     | 393         | 1          | 57         | 61.2%
0017  | 6      | 42     | 376         | 3          | 72         | 62.5%
0018  | 6      | 42     | 305         | 7          | 139        | 68.4%
0019  | 6      | 42     | 218         | 8          | 225        | 75.9%
0020  | 6      | 42     | 217         | 8          | 226        | 76.0%
0021  | 6      | 42     | 206         | 8          | 237        | 77.0%

A few notes:
* Capstone fails to build when rebased to the base-commit (known issue,
  not yet fixed [4]). I disabled it locally for testing.
* Among the remaining 23% bad annotation, about 14% comes from compiler
  generated prologue or epilogue code (e.g., stp	x21, x22, [sp, #32]).
  No solution has been identified for this so far.

Testing
=======

Tested on arm64 (all passed):

  # perf test -v "perf data type profiling tests"
  81: perf data type profiling tests                                                                  : Ok

  === Test Summary ===
  Passed main tests : 1
  Passed subtests   : 0
  Skipped tests     : 0
  Failed tests      : 0

Tested on x86. The profiling results show no change before/after applying
this patch series:

  before : total 880, ok 711 (80.8%), bad 169 (19.2%)
  after  : total 880, ok 711 (80.8%), bad 169 (19.2%)

Changelog
=========

v2 -> v3:
  - v2: https://lore.kernel.org/all/20260403094800.1418825-1-wutengda@huaweicloud.com/
  - Instead of always parsing the left operand as src and the right operand as
    dst, set them based on the actual instruction definition. (Namhyung Kim)
  - Fix refcount leak in print_capstone_detail().
  - Remove useless '<' check when parsing 'addr <symbol>' in arm64_mov__parse().
  - Add example comments in arm64_ldst__parse().
  - Split arch__dwarf_regnum() changes into a separate commit.
  - Rename annotated_addr_mode enum: INSN_ADDR_* -> PERF_ADDR_MODE_*.
  - Set caller-saved registers in init_type_state().
  - For instructions with addressing mode, always goto adjust_reg_index_state()
    at the end to update the src register state.
  - Handle TSR_KIND_CONST registers for 'mov' and 'add' instructions.
  - Invalidate dst register for all other unsupported instructions.
  - Verify type DIE is task_struct pointer before caching globally.
  - Enable --itrace=i1i by default for ARM SPE data type profiling in 'perf annotate'
    to avoid overlapping event counting for the same instruction. (James Clark)
  - Fix global variable type resolving error in check_matching_type(). (James Clark)
  - Address review comments from sashiko [1]:
    - Fix unconditional call to arch->extract_op_location()
    - Handle multi_regs correctly
    - Fix invalid register state in error path
    - Other misc fixes
v1 -> v2:
  - v1: https://lore.kernel.org/all/20250314162137.528204-1-lihuafei1@huawei.com/
  - Fix inconsistencies in arm64 instruction output across llvm, capstone,
    and objdump disassembly backends.
  - Support arm64-specific addressing modes and operand formats. (Leo Yan)
  - Extend instruction tracking to support mov and add instructions,
    along with per-cpu and stack variables.
  - Include real-world examples in commit messages to demonstrate
    practical effects. (Namhyung Kim)
  - Improve type-tracking success rate (type stat) from 64.2% to 82.1%.

Please let me know if you have any feedback.

Thanks,
Tengda

[1] https://sashiko.dev/#/patchset/20260403094800.1418825-1-wutengda%40huaweicloud.com
[2] https://developer.arm.com/documentation/102374/0103
[3] https://github.com/flynd/asmsheets/releases/tag/v8
[4] https://lore.kernel.org/all/aiCgbmUtlMCM4Xzt@x1/#t


Tengda Wu (21):
  perf capstone: Fix kernel map reference count leak
  perf capstone: Fix arm64 jump/adrp disassembly mismatch with objdump
  perf llvm: Fix arm64 adrp instruction disassembly mismatch with
    objdump
  perf annotate-arm64: Generalize arm64_mov__parse to support more
    instructions
  perf annotate-arm64: Handle load and store instructions
  perf dwarf-regs: Adapt get_dwarf_regnum() for arm64
  perf annotate: Adapt arch__dwarf_regnum() for arm64
  perf annotate: Introduce extract_op_location callback for
    arch-specific parsing
  perf annotate-arm64: Implement extract_op_location() callback
  perf annotate: Deduplicate overlapping ARM SPE events for data type
    profiling
  perf auxtrace: Set default period to 1 for
    PERF_ITRACE_PERIOD_INSTRUCTIONS type
  perf annotate-data: Extract invalidate_reg_state() as a common helper
  perf annotate-arm64: Enable instruction tracking support
  perf annotate-arm64: Support load instruction tracking
  perf annotate-arm64: Support store instruction tracking
  perf annotate-arm64: Support stack variable tracking
  perf annotate-arm64: Support 'mov' instruction tracking
  perf annotate-arm64: Support 'add' instruction tracking
  perf annotate-arm64: Support 'adrp' instruction to track global
    variables
  perf annotate-arm64: Support per-cpu variable access tracking
  perf annotate-arm64: Support 'mrs' instruction to track 'current'
    pointer

 tools/perf/builtin-annotate.c                 |  13 +
 .../perf/util/annotate-arch/annotate-arm64.c  | 901 +++++++++++++++++-
 .../util/annotate-arch/annotate-powerpc.c     |  10 +
 tools/perf/util/annotate-arch/annotate-x86.c  |  97 +-
 tools/perf/util/annotate-data.c               |  77 +-
 tools/perf/util/annotate-data.h               |   8 +-
 tools/perf/util/annotate.c                    | 108 +--
 tools/perf/util/annotate.h                    |  12 +
 tools/perf/util/arm-spe.c                     |   1 +
 tools/perf/util/auxtrace.c                    |   6 +
 tools/perf/util/capstone.c                    | 144 ++-
 tools/perf/util/disasm.c                      |   5 +
 tools/perf/util/disasm.h                      |   5 +
 .../util/dwarf-regs-arch/dwarf-regs-arm64.c   |  20 +
 tools/perf/util/dwarf-regs.c                  |   2 +-
 tools/perf/util/include/dwarf-regs.h          |   1 +
 tools/perf/util/llvm.c                        |  50 +
 17 files changed, 1307 insertions(+), 153 deletions(-)


base-commit: 7de6ae9e12207ec146f2f3f1e58d1a99317e88bc
-- 
2.34.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 01/21] perf capstone: Fix kernel map reference count leak
  2026-07-01  3:53 [PATCH v3 00/21] perf arm64: Support data type profiling Tengda Wu
@ 2026-07-01  3:53 ` Tengda Wu
  2026-07-01  3:53 ` [PATCH v3 02/21] perf capstone: Fix arm64 jump/adrp disassembly mismatch with objdump Tengda Wu
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 52+ messages in thread
From: Tengda Wu @ 2026-07-01  3:53 UTC (permalink / raw)
  To: Namhyung Kim, james.clark, xueshuai, Li Huafei
  Cc: Peter Zijlstra, leo.yan, Ian Rogers, Kim Phillips, Mark Rutland,
	Arnaldo Carvalho de Melo, Ingo Molnar, Bill Wendling,
	Nick Desaulniers, Alexander Shishkin, Adrian Hunter, Zecheng Li,
	linux-perf-users, linux-kernel, llvm, Tengda Wu

In print_capstone_detail(), maps__find() is used to locate the kernel
map. This function increments the reference count of the found map
object. However, the current implementation fails to call map__put()
after the map is no longer needed, leading to a reference count leak.

Fix this by adding a map__put(map) call to properly release the
reference after use.

Fixes: 92dfc59463d5 ("perf annotate: Add symbol name when using capstone")
Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
---
 tools/perf/util/capstone.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/capstone.c b/tools/perf/util/capstone.c
index 5ad537fea436..9bba78ee0c5a 100644
--- a/tools/perf/util/capstone.c
+++ b/tools/perf/util/capstone.c
@@ -302,6 +302,7 @@ static void print_capstone_detail(struct cs_insn *insn, char *buf, size_t len,
 	for (i = 0; i < insn->detail->x86.op_count; i++) {
 		struct cs_x86_op *op = &insn->detail->x86.operands[i];
 		u64 orig_addr;
+		struct map *found_map = NULL;
 
 		if (op->type != X86_OP_MEM)
 			continue;
@@ -317,19 +318,22 @@ static void print_capstone_detail(struct cs_insn *insn, char *buf, size_t len,
 		if (dso__kernel(map__dso(map))) {
 			/*
 			 * The kernel maps can be split into sections, let's
-			 * find the map first and the search the symbol.
+			 * find the map first and then search the symbol.
 			 */
-			map = maps__find(map__kmaps(map), addr);
-			if (map == NULL)
+			found_map = maps__find(map__kmaps(map), addr);
+			if (found_map == NULL)
 				continue;
+			map = found_map;
 		}
 
 		/* convert it to map-relative address for search */
 		addr = map__map_ip(map, addr);
 
 		sym = map__find_symbol(map, addr);
-		if (sym == NULL)
+		if (sym == NULL) {
+			map__put(found_map);
 			continue;
+		}
 
 		if (addr == sym->start) {
 			scnprintf(buf, len, "\t# %"PRIx64" <%s>",
@@ -338,6 +342,7 @@ static void print_capstone_detail(struct cs_insn *insn, char *buf, size_t len,
 			scnprintf(buf, len, "\t# %"PRIx64" <%s+%#"PRIx64">",
 				  orig_addr, sym->name, addr - sym->start);
 		}
+		map__put(found_map);
 		break;
 	}
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 02/21] perf capstone: Fix arm64 jump/adrp disassembly mismatch with objdump
  2026-07-01  3:53 [PATCH v3 00/21] perf arm64: Support data type profiling Tengda Wu
  2026-07-01  3:53 ` [PATCH v3 01/21] perf capstone: Fix kernel map reference count leak Tengda Wu
@ 2026-07-01  3:53 ` Tengda Wu
  2026-07-01  4:07   ` sashiko-bot
  2026-07-01  3:53 ` [PATCH v3 03/21] perf llvm: Fix arm64 adrp instruction " Tengda Wu
                   ` (18 subsequent siblings)
  20 siblings, 1 reply; 52+ messages in thread
From: Tengda Wu @ 2026-07-01  3:53 UTC (permalink / raw)
  To: Namhyung Kim, james.clark, xueshuai, Li Huafei
  Cc: Peter Zijlstra, leo.yan, Ian Rogers, Kim Phillips, Mark Rutland,
	Arnaldo Carvalho de Melo, Ingo Molnar, Bill Wendling,
	Nick Desaulniers, Alexander Shishkin, Adrian Hunter, Zecheng Li,
	linux-perf-users, linux-kernel, llvm, Tengda Wu

The jump and adrp instructions parsed by libcapstone currently lack
symbolic representation and use a '#' prefix for addresses. This
format is inconsistent with objdump's output, which causes subsequent
parsing in jump__parse() and arm64_mov__parse() to fail.

Example mismatch:
  Current: b #0xffff8000800114c8
  Fix:     b ffff8000800114c8 <el0t_64_sync+0x108>

  Current: adrp x18, #0xffff800081f5f000
  Fix:     adrp x18, ffff800081f5f000 <this_cpu_vector>

Fix this by implementing extended formatting for these arm64
instructions during symbol__disassemble_capstone(). This ensures
the output matches objdump's expected style, including the raw
address and the associated <symbol+offset> suffix.

Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
---
 tools/perf/util/capstone.c | 131 +++++++++++++++++++++++++++++++++----
 tools/perf/util/disasm.c   |   5 ++
 tools/perf/util/disasm.h   |   1 +
 3 files changed, 125 insertions(+), 12 deletions(-)

diff --git a/tools/perf/util/capstone.c b/tools/perf/util/capstone.c
index 9bba78ee0c5a..a9c0a1d2eb03 100644
--- a/tools/perf/util/capstone.c
+++ b/tools/perf/util/capstone.c
@@ -218,6 +218,12 @@ static int capstone_init(uint16_t e_machine, csh *cs_handle, bool is64, bool is_
 		 * on x86 by investigating instruction details.
 		 */
 		perf_cs_option(*cs_handle, CS_OPT_DETAIL, CS_OPT_ON);
+	} else if (machine__normalized_is(machine, "arm64")) {
+		/*
+		 * Same as x86: arm64 needs instruction details to resolve
+		 * symbolic addresses
+		 */
+		perf_cs_option(*cs_handle, CS_OPT_DETAIL, CS_OPT_ON);
 	}
 
 	return 0;
@@ -292,10 +298,6 @@ static void print_capstone_detail(struct cs_insn *insn, char *buf, size_t len,
 	struct map *map = args->ms->map;
 	struct symbol *sym;
 
-	/* TODO: support more architectures */
-	if (!arch__is_x86(args->arch))
-		return;
-
 	if (insn->detail == NULL)
 		return;
 
@@ -347,6 +349,116 @@ static void print_capstone_detail(struct cs_insn *insn, char *buf, size_t len,
 	}
 }
 
+static int print_default_format(struct cs_insn *insn, char *buf, size_t len)
+{
+	return scnprintf(buf, len, "       %-7s %s",
+			 insn->mnemonic, insn->op_str);
+}
+
+static void format_capstone_insn_x86(struct cs_insn *insn, char *buf,
+				     size_t len, struct annotate_args *args,
+				     u64 addr)
+{
+	int printed;
+
+	printed = print_default_format(insn, buf, len);
+	buf += printed;
+	len -= printed;
+
+	print_capstone_detail(insn, buf, len, args, addr);
+}
+
+static bool is_pc_relative_insn(struct cs_insn *insn)
+{
+	int i;
+
+	if (insn->id == ARM64_INS_ADR || insn->id == ARM64_INS_ADRP)
+		return true;
+
+	if (insn->detail == NULL)
+		return false;
+
+	for (i = 0; i < insn->detail->groups_count; i++) {
+		if (insn->detail->groups[i] == ARM64_GRP_JUMP ||
+		    insn->detail->groups[i] == ARM64_GRP_CALL ||
+		    insn->detail->groups[i] == ARM64_GRP_BRANCH_RELATIVE)
+			return true;
+	}
+
+	return false;
+}
+
+static void format_capstone_insn_arm64(struct cs_insn *insn, char *buf,
+				       size_t len, struct annotate_args *args,
+				       u64 addr)
+{
+	struct map *map = args->ms->map;
+	struct symbol *sym;
+	char *last_imm, *endptr;
+	u64 orig_addr;
+	struct map *found_map = NULL;
+
+	print_default_format(insn, buf, len);
+	/*
+	 * Adjust instructions to keep the existing behavior with objdump.
+	 *
+	 * Example conversion:
+	 * From: b #0xffff8000800114c8
+	 * To:   b ffff8000800114c8 <el0t_64_sync+0x108>
+	 */
+	if (is_pc_relative_insn(insn)) {
+		/* Extract last immediate value as address */
+		last_imm = strrchr(buf, '#');
+		if (!last_imm)
+			return;
+
+		orig_addr = strtoull(last_imm + 1, &endptr, 16);
+		if (endptr == last_imm + 1)
+			return;
+
+		/* Relocate map that contains the address */
+		if (dso__kernel(map__dso(map))) {
+			found_map = maps__find(map__kmaps(map), orig_addr);
+			if (found_map == NULL)
+				return;
+			map = found_map;
+		}
+
+		/* Convert it to map-relative address for search */
+		addr = map__map_ip(map, orig_addr);
+
+		sym = map__find_symbol(map, addr);
+		if (sym == NULL) {
+			map__put(found_map);
+			return;
+		}
+
+		/* Symbolize the resolved address */
+		len = len - (last_imm - buf);
+		if (addr == sym->start) {
+			scnprintf(last_imm, len, "%"PRIx64" <%s>",
+				  orig_addr, sym->name);
+		} else {
+			scnprintf(last_imm, len, "%"PRIx64" <%s+%#"PRIx64">",
+				  orig_addr, sym->name, addr - sym->start);
+		}
+		map__put(found_map);
+	}
+}
+
+static void format_capstone_insn(struct cs_insn *insn, char *buf, size_t len,
+				 struct annotate_args *args, u64 addr)
+{
+	/* TODO: support more architectures */
+	if (arch__is_x86(args->arch))
+		format_capstone_insn_x86(insn, buf, len, args, addr);
+	else if (arch__is_arm64(args->arch))
+		format_capstone_insn_arm64(insn, buf, len, args, addr);
+	else {
+		print_default_format(insn, buf, len);
+	}
+}
+
 struct find_file_offset_data {
 	u64 ip;
 	u64 offset;
@@ -426,14 +538,9 @@ int symbol__disassemble_capstone(const char *filename, struct symbol *sym,
 
 	free_count = count = perf_cs_disasm(handle, buf, buf_len, start, buf_len, &insn);
 	for (i = 0, offset = 0; i < count; i++) {
-		int printed;
-
-		printed = scnprintf(disasm_buf, sizeof(disasm_buf),
-				    "       %-7s %s",
-				    insn[i].mnemonic, insn[i].op_str);
-		print_capstone_detail(&insn[i], disasm_buf + printed,
-				      sizeof(disasm_buf) - printed, args,
-				      start + offset);
+		format_capstone_insn(&insn[i], disasm_buf,
+				     sizeof(disasm_buf), args,
+				     start + offset);
 
 		args->offset = offset;
 		args->line = disasm_buf;
diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
index 0a1a7e9cf3ef..bd28829cb00d 100644
--- a/tools/perf/util/disasm.c
+++ b/tools/perf/util/disasm.c
@@ -203,6 +203,11 @@ bool arch__is_powerpc(const struct arch *arch)
 	return arch->id.e_machine == EM_PPC || arch->id.e_machine == EM_PPC64;
 }
 
+bool arch__is_arm64(const struct arch *arch)
+{
+	return arch->id.e_machine == EM_AARCH64;
+}
+
 static void ins_ops__delete(struct ins_operands *ops)
 {
 	if (ops == NULL)
diff --git a/tools/perf/util/disasm.h b/tools/perf/util/disasm.h
index 25756e3f47e4..06c664fd4fc2 100644
--- a/tools/perf/util/disasm.h
+++ b/tools/perf/util/disasm.h
@@ -111,6 +111,7 @@ struct annotate_args {
 const struct arch *arch__find(uint16_t e_machine, uint32_t e_flags, const char *cpuid);
 bool arch__is_x86(const struct arch *arch);
 bool arch__is_powerpc(const struct arch *arch);
+bool arch__is_arm64(const struct arch *arch);
 
 extern const struct ins_ops call_ops;
 extern const struct ins_ops dec_ops;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 03/21] perf llvm: Fix arm64 adrp instruction disassembly mismatch with objdump
  2026-07-01  3:53 [PATCH v3 00/21] perf arm64: Support data type profiling Tengda Wu
  2026-07-01  3:53 ` [PATCH v3 01/21] perf capstone: Fix kernel map reference count leak Tengda Wu
  2026-07-01  3:53 ` [PATCH v3 02/21] perf capstone: Fix arm64 jump/adrp disassembly mismatch with objdump Tengda Wu
@ 2026-07-01  3:53 ` Tengda Wu
  2026-07-01  4:05   ` sashiko-bot
  2026-07-01  3:53 ` [PATCH v3 04/21] perf annotate-arm64: Generalize arm64_mov__parse to support more instructions Tengda Wu
                   ` (17 subsequent siblings)
  20 siblings, 1 reply; 52+ messages in thread
From: Tengda Wu @ 2026-07-01  3:53 UTC (permalink / raw)
  To: Namhyung Kim, james.clark, xueshuai, Li Huafei
  Cc: Peter Zijlstra, leo.yan, Ian Rogers, Kim Phillips, Mark Rutland,
	Arnaldo Carvalho de Melo, Ingo Molnar, Bill Wendling,
	Nick Desaulniers, Alexander Shishkin, Adrian Hunter, Zecheng Li,
	linux-perf-users, linux-kernel, llvm, Tengda Wu

The operands of 'adrp' instructions parsed by libllvm are currently
represented as raw immediates rather than the "address <symbol+offset>"
format used by objdump. This inconsistency causes arm64_mov__parse()
to fail when parsing these instructions during post-processing.

Example of the mismatch:
  Current: adrp x18, 8014
  Fix:     adrp x18, ffff800081f5f000 <this_cpu_vector>

Fix this by manually extracting the target address from the raw adrp
instruction via symbol_lookup_callback(). The address is then converted
to a specific symbol during symbol__disassemble_llvm() and formatted
to match objdump's output, ensuring compatibility with existing
parsers.

Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
---
 tools/perf/util/llvm.c | 50 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/tools/perf/util/llvm.c b/tools/perf/util/llvm.c
index a0deb742a733..533d47e8084d 100644
--- a/tools/perf/util/llvm.c
+++ b/tools/perf/util/llvm.c
@@ -94,6 +94,7 @@ static void init_llvm(void)
 struct symbol_lookup_storage {
 	u64 branch_addr;
 	u64 pcrel_load_addr;
+	u64 pcrel_adrp_addr;
 };
 
 static const char *
@@ -108,6 +109,18 @@ symbol_lookup_callback(void *disinfo, uint64_t value,
 		storage->branch_addr = value;
 	else if (*ref_type == LLVMDisassembler_ReferenceType_In_PCrel_Load)
 		storage->pcrel_load_addr = value;
+	else if (*ref_type == LLVMDisassembler_ReferenceType_In_ARM64_ADRP) {
+		uint64_t adrp_imm;
+
+		/* immhi (bits 23:5) and immlo (bits 30:29) */
+		adrp_imm = ((value & 0x00ffffe0) >> 3) | ((value >> 29) & 0x3);
+		/* Sign-extend the 21-bit immediate to 64-bit */
+		if (adrp_imm & (1ULL << 20))
+			adrp_imm |= ~((1ULL << 21) - 1);
+
+		/* Calculate the target page address */
+		storage->pcrel_adrp_addr = (address & ~0xFFFLL) + (adrp_imm << 12);
+	}
 	*ref_type = LLVMDisassembler_ReferenceType_InOut_None;
 	return NULL;
 }
@@ -204,6 +217,7 @@ int symbol__disassemble_llvm(const char *filename, struct symbol *sym,
 
 		storage.branch_addr = 0;
 		storage.pcrel_load_addr = 0;
+		storage.pcrel_adrp_addr = 0;
 
 		/*
 		 * LLVM's API has the code be disassembled as non-const, cast
@@ -227,6 +241,42 @@ int symbol__disassemble_llvm(const char *filename, struct symbol *sym,
 				free(name);
 			}
 		}
+		if (storage.pcrel_adrp_addr != 0) {
+			/*
+			 * ADRP (Address Page) instructions encode a 21-bit signed
+			 * immediate offset relative to the current PC's page.
+			 *
+			 * To maintain consistency with standard objdump output,
+			 * we truncate the raw encoded immediate at the comma
+			 * and replace it with the resolved absolute page address.
+			 *
+			 * Example conversion:
+			 * From: adrp x18, 8014
+			 * To:   adrp x18, ffff800081f5f000 <this_cpu_vector>
+			 */
+			char *name;
+			char *s = strchr(disasm_buf, ',');
+
+			if (s == NULL)
+				goto err;
+
+			s++;
+			*s = '\0';
+			disasm_len = strlen(disasm_buf);
+			disasm_len += scnprintf(disasm_buf + disasm_len,
+						sizeof(disasm_buf) - disasm_len,
+						" %"PRIx64,
+						storage.pcrel_adrp_addr);
+			name = llvm_name_for_data(dso, filename,
+						  storage.pcrel_adrp_addr);
+			if (name) {
+				disasm_len += scnprintf(disasm_buf + disasm_len,
+							sizeof(disasm_buf) -
+							disasm_len,
+							" <%s>", name);
+				free(name);
+			}
+		}
 		if (storage.pcrel_load_addr != 0) {
 			char *name = llvm_name_for_data(dso, filename,
 							storage.pcrel_load_addr);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 04/21] perf annotate-arm64: Generalize arm64_mov__parse to support more instructions
  2026-07-01  3:53 [PATCH v3 00/21] perf arm64: Support data type profiling Tengda Wu
                   ` (2 preceding siblings ...)
  2026-07-01  3:53 ` [PATCH v3 03/21] perf llvm: Fix arm64 adrp instruction " Tengda Wu
@ 2026-07-01  3:53 ` Tengda Wu
  2026-07-01  4:03   ` sashiko-bot
  2026-07-01  3:53 ` [PATCH v3 05/21] perf annotate-arm64: Handle load and store instructions Tengda Wu
                   ` (16 subsequent siblings)
  20 siblings, 1 reply; 52+ messages in thread
From: Tengda Wu @ 2026-07-01  3:53 UTC (permalink / raw)
  To: Namhyung Kim, james.clark, xueshuai, Li Huafei
  Cc: Peter Zijlstra, leo.yan, Ian Rogers, Kim Phillips, Mark Rutland,
	Arnaldo Carvalho de Melo, Ingo Molnar, Bill Wendling,
	Nick Desaulniers, Alexander Shishkin, Adrian Hunter, Zecheng Li,
	linux-perf-users, linux-kernel, llvm, Tengda Wu

As the default parser for arm64 instructions, arm64_mov__parse()
currently only supports parsing instructions with the 'addr <symbol>'
suffix. Other instructions are not supported, causing a lack of
source and target information for most regular instructions.

Generalize arm64_mov__parse to extend its parsing capabilities:
1. Parse 'ops->target.raw' and 'ops->source.raw' by default.
2. Utilize arm64__check_multi_regs() to identify if 'ops->source.raw'
   contains multiple registers.
3. Maintain backward compatibility for instructions containing the
   'addr <symbol>' suffix.

Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
---
 .../perf/util/annotate-arch/annotate-arm64.c  | 118 ++++++++++++++----
 1 file changed, 95 insertions(+), 23 deletions(-)

diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
index 33080fdca125..a1667237114a 100644
--- a/tools/perf/util/annotate-arch/annotate-arm64.c
+++ b/tools/perf/util/annotate-arch/annotate-arm64.c
@@ -4,6 +4,8 @@
 #include <stdlib.h>
 #include <string.h>
 #include <linux/zalloc.h>
+#include <linux/string.h>
+#include <linux/ctype.h>
 #include <regex.h>
 #include "../annotate.h"
 #include "../disasm.h"
@@ -14,59 +16,129 @@ struct arch_arm64 {
 	regex_t jump_insn;
 };
 
-static int arm64_mov__parse(const struct arch *arch __maybe_unused,
+static bool arm64__is_reg(const char *op)
+{
+	if (!op || !*op)
+		return false;
+
+	/*
+	 * General-purpose registers: x0-x30, w0-w30.
+	 * Check for 'x' or 'w' prefix followed by a numeric index.
+	 */
+	if ((op[0] == 'x' || op[0] == 'w') && isdigit(op[1]))
+		return true;
+
+	/*
+	 * Special-purpose registers:
+	 * sp: stack pointer
+	 * xzr/wzr: zero registers
+	 */
+	if (!strncmp(op, "sp", 2) || !strncmp(op, "xzr", 3) ||
+	    !strncmp(op, "wzr", 3))
+		return true;
+
+	return false;
+}
+
+static bool arm64__check_multi_regs(const struct arch *arch, const char *op)
+{
+	const char *p = op;
+	int reg_count = 0;
+
+	while (p && *p) {
+		p = skip_spaces(p);
+		if (*p == arch->objdump.memory_ref_char)
+			p++;
+
+		if (arm64__is_reg(p))
+			reg_count++;
+
+		if (reg_count >= 2)
+			return true;
+
+		/* Move to next operand after comma */
+		p = strchr(p, ',');
+		if (p)
+			p++;
+	}
+
+	return false;
+}
+
+static int arm64_mov__parse(const struct arch *arch,
 			    struct ins_operands *ops,
 			    struct map_symbol *ms __maybe_unused,
 			    struct disasm_line *dl __maybe_unused)
 {
-	char *s = strchr(ops->raw, ','), *target, *endptr;
+	char *s = strchr(ops->raw, ','), *source, *endptr, *comment, prev;
 
 	if (s == NULL)
 		return -1;
 
 	*s = '\0';
-	ops->source.raw = strdup(ops->raw);
+	ops->target.raw = strdup(ops->raw);
 	*s = ',';
 
-	if (ops->source.raw == NULL)
+	if (ops->target.raw == NULL)
 		return -1;
 
-	target = ++s;
-	ops->target.raw = strdup(target);
-	if (ops->target.raw == NULL)
-		goto out_free_source;
+	/* Parse source, discarding comment if present */
+	source = skip_spaces(++s);
+	comment = strchr(s, arch->objdump.comment_char);
+
+	if (comment != NULL)
+		s = comment - 1;
+	else
+		s = strchr(s, '\0') - 1;
 
-	ops->target.addr = strtoull(target, &endptr, 16);
-	if (endptr == target)
-		goto out_free_target;
+	while (s > source && isspace(s[0]))
+		--s;
+	s++;
+	prev = *s;
+	*s = '\0';
+	ops->source.raw = strdup(source);
+	*s = prev;
 
+	if (ops->source.raw == NULL) {
+		zfree(&ops->target.raw);
+		return -1;
+	}
+
+	ops->source.multi_regs = arm64__check_multi_regs(arch, ops->source.raw);
+
+	/* Parse address from source (if any) */
+	ops->source.addr = strtoull(source, &endptr, 16);
+	if (endptr == source)
+		return 0;
+
+	/* Parse a symbol followed by an address, if present */
 	s = strchr(endptr, '<');
 	if (s == NULL)
-		goto out_free_target;
-	endptr = strchr(s + 1, '>');
+		return 0;
+	endptr = strrchr(s + 1, '>');
 	if (endptr == NULL)
-		goto out_free_target;
+		return 0;
 
 	*endptr = '\0';
 	*s = ' ';
-	ops->target.name = strdup(s);
+	s = skip_spaces(++s);
+	ops->source.name = strdup(s);
 	*s = '<';
 	*endptr = '>';
-	if (ops->target.name == NULL)
-		goto out_free_target;
 
 	return 0;
+}
 
-out_free_target:
-	zfree(&ops->target.raw);
-out_free_source:
-	zfree(&ops->source.raw);
-	return -1;
+static int arm64_mov__scnprintf(const struct ins *ins, char *bf, size_t size,
+				struct ins_operands *ops, int max_ins_name)
+{
+	return scnprintf(bf, size, "%-*s %s, %s", max_ins_name, ins->name,
+			 ops->target.raw, ops->source.name ?: ops->source.raw);
 }
 
 static const struct ins_ops arm64_mov_ops = {
 	.parse	   = arm64_mov__parse,
-	.scnprintf = mov__scnprintf,
+	.scnprintf = arm64_mov__scnprintf,
 };
 
 static const struct ins_ops *arm64__associate_instruction_ops(struct arch *arch, const char *name)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 05/21] perf annotate-arm64: Handle load and store instructions
  2026-07-01  3:53 [PATCH v3 00/21] perf arm64: Support data type profiling Tengda Wu
                   ` (3 preceding siblings ...)
  2026-07-01  3:53 ` [PATCH v3 04/21] perf annotate-arm64: Generalize arm64_mov__parse to support more instructions Tengda Wu
@ 2026-07-01  3:53 ` Tengda Wu
  2026-07-01  4:07   ` sashiko-bot
  2026-07-01  3:53 ` [PATCH v3 06/21] perf dwarf-regs: Adapt get_dwarf_regnum() for arm64 Tengda Wu
                   ` (15 subsequent siblings)
  20 siblings, 1 reply; 52+ messages in thread
From: Tengda Wu @ 2026-07-01  3:53 UTC (permalink / raw)
  To: Namhyung Kim, james.clark, xueshuai, Li Huafei
  Cc: Peter Zijlstra, leo.yan, Ian Rogers, Kim Phillips, Mark Rutland,
	Arnaldo Carvalho de Melo, Ingo Molnar, Bill Wendling,
	Nick Desaulniers, Alexander Shishkin, Adrian Hunter, Zecheng Li,
	linux-perf-users, linux-kernel, llvm, Tengda Wu

Add ldst_ops to handle load and store instructions in order to parse
the data types and offsets associated with PMU events for memory access
instructions. There are many variants of load and store instructions in
ARM64, making it difficult to match all of these instruction names
completely. Therefore, only the instruction prefixes are matched. The
prefix 'ld|st' covers most of the memory access instructions, 'cas|swp'
matches atomic instructions, and 'prf' matches memory prefetch
instructions.

Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
---
 .../perf/util/annotate-arch/annotate-arm64.c  | 115 ++++++++++++++++++
 1 file changed, 115 insertions(+)

diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
index a1667237114a..451500def8c1 100644
--- a/tools/perf/util/annotate-arch/annotate-arm64.c
+++ b/tools/perf/util/annotate-arch/annotate-arm64.c
@@ -14,6 +14,7 @@ struct arch_arm64 {
 	struct arch arch;
 	regex_t call_insn;
 	regex_t jump_insn;
+	regex_t ldst_insn; /* load and store instruction */
 };
 
 static bool arm64__is_reg(const char *op)
@@ -141,6 +142,104 @@ static const struct ins_ops arm64_mov_ops = {
 	.scnprintf = arm64_mov__scnprintf,
 };
 
+static bool arm64__insn_is_source_first(struct disasm_line *dl)
+{
+	/*
+	 * Store instructions invert the standard syntax by placing the source
+	 * register before the destination memory address.
+	 */
+	return !strncmp(dl->ins.name, "st", 2);
+}
+
+/*
+ * This function is used to parse arm64 load/store instructions into
+ * instruction operands.
+ *
+ * Typical instructions and their parsing logic:
+ *
+ * 1. Immediate offset:
+ *    ldr   x2, [x0]                -> target="x2", source="[x0]"
+ *    ldr   x2, [x0, #24]           -> target="x2", source="[x0, #24]"
+ *    ldp   x19, x20, [sp, #16]     -> target="x19, x20", source="[sp, #16]"
+ *
+ * 2. Pre-index addressing:
+ *    stp   x29, x30, [sp, #-64]!   -> target="[sp, #-64]!", source="x29, x30"
+ *
+ * 3. Post-index addressing:
+ *    str   x1, [x0], #8            -> target="[x0], #8", source="x1"
+ *    ldr   w1, [x21], #4           -> target="w1", source="[x21], #4"
+ *    ldp   x29, x30, [sp], #32     -> target="x29, x30", source="[sp], #32"
+ *
+ * 4. Register offset / extension:
+ *    ldr   x0, [x1, w0, sxtw #3]   -> target="x0", source="[x1, w0, sxtw #3]"
+ *    ldr   x0, [x1, x0, lsl #3]    -> target="x0", source="[x1, x0, lsl #3]"
+ *
+ * 5. Atomic operations:
+ *    cas   w3, w1, [x0]            -> target="w3, w1", source="[x0]"
+ *    swp   x3, x0, [x2]            -> target="x3, x0", source="[x2]"
+ *
+ * 6. Prefetch memory:
+ *    prfm  pstl1strm, [x4]         -> target="pstl1strm", source="[x4]"
+ *
+ * Parsing strategy:
+ * Use the '[' bracket as the boundary to split the operands into left
+ * and right sides. For non-store instructions, the left side is the
+ * target and the right side is the source. For store instructions, the
+ * roles are reversed.
+ */
+static int arm64_ldst__parse(const struct arch *arch,
+			     struct ins_operands *ops,
+			     struct map_symbol *ms __maybe_unused,
+			     struct disasm_line *dl)
+{
+	char *s, *left, *right;
+
+	right = s = strchr(ops->raw, arch->objdump.memory_ref_char);
+	if (!s)
+		return -1;
+
+	while (s > ops->raw && *s != ',')
+		--s;
+
+	if (s == ops->raw)
+		return -1;
+
+	*s = '\0';
+	left = strdup(ops->raw);
+
+	*s = ',';
+	if (!left)
+		return -1;
+
+	right = strdup(right);
+	if (!right) {
+		zfree(&left);
+		return -1;
+	}
+
+	if (arm64__insn_is_source_first(dl)) {
+		ops->source.raw = left;
+		ops->source.mem_ref = false;
+
+		ops->target.raw = right;
+		ops->target.mem_ref = true;
+		ops->target.multi_regs = arm64__check_multi_regs(arch, ops->target.raw);
+	} else {
+		ops->source.raw = right;
+		ops->source.mem_ref = true;
+		ops->source.multi_regs = arm64__check_multi_regs(arch, ops->source.raw);
+
+		ops->target.raw = left;
+		ops->target.mem_ref = false;
+	}
+
+	return 0;
+}
+
+static struct ins_ops arm64_ldst_ops = {
+	.parse	   = arm64_ldst__parse,
+};
+
 static const struct ins_ops *arm64__associate_instruction_ops(struct arch *arch, const char *name)
 {
 	struct arch_arm64 *arm = container_of(arch, struct arch_arm64, arch);
@@ -151,6 +250,8 @@ static const struct ins_ops *arm64__associate_instruction_ops(struct arch *arch,
 		ops = &jump_ops;
 	else if (!regexec(&arm->call_insn, name, 2, match, 0))
 		ops = &call_ops;
+	else if (!regexec(&arm->ldst_insn, name, 2, match, 0))
+		ops = &arm64_ldst_ops;
 	else if (!strcmp(name, "ret"))
 		ops = &ret_ops;
 	else
@@ -175,6 +276,8 @@ const struct arch *arch__new_arm64(const struct e_machine_and_e_flags *id,
 	arch->id = *id;
 	arch->objdump.comment_char	  = '/';
 	arch->objdump.skip_functions_char = '+';
+	arch->objdump.memory_ref_char	  = '[';
+	arch->objdump.imm_char		  = '#';
 	arch->associate_instruction_ops   = arm64__associate_instruction_ops;
 
 	/* bl, blr */
@@ -188,8 +291,20 @@ const struct arch *arch__new_arm64(const struct e_machine_and_e_flags *id,
 	if (err)
 		goto out_free_call;
 
+	/*
+	 * The ARM64 architecture has many variants of load/store instructions.
+	 * It is quite challenging to match all of them completely. Here, we
+	 * only match the prefixes of these instructions.
+	 */
+	err = regcomp(&arm->ldst_insn, "^(ld|st|cas|prf|swp)",
+		      REG_EXTENDED);
+	if (err)
+		goto out_free_jump;
+
 	return arch;
 
+out_free_jump:
+	regfree(&arm->jump_insn);
 out_free_call:
 	regfree(&arm->call_insn);
 out_free_arm:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 06/21] perf dwarf-regs: Adapt get_dwarf_regnum() for arm64
  2026-07-01  3:53 [PATCH v3 00/21] perf arm64: Support data type profiling Tengda Wu
                   ` (4 preceding siblings ...)
  2026-07-01  3:53 ` [PATCH v3 05/21] perf annotate-arm64: Handle load and store instructions Tengda Wu
@ 2026-07-01  3:53 ` Tengda Wu
  2026-07-01  4:07   ` sashiko-bot
  2026-07-01  3:53 ` [PATCH v3 07/21] perf annotate: Adapt arch__dwarf_regnum() " Tengda Wu
                   ` (14 subsequent siblings)
  20 siblings, 1 reply; 52+ messages in thread
From: Tengda Wu @ 2026-07-01  3:53 UTC (permalink / raw)
  To: Namhyung Kim, james.clark, xueshuai, Li Huafei
  Cc: Peter Zijlstra, leo.yan, Ian Rogers, Kim Phillips, Mark Rutland,
	Arnaldo Carvalho de Melo, Ingo Molnar, Bill Wendling,
	Nick Desaulniers, Alexander Shishkin, Adrian Hunter, Zecheng Li,
	linux-perf-users, linux-kernel, llvm, Tengda Wu

The current arm64 DWARF register lookup relies on 'aarch64_regstr_tbl',
a static string table. While this works for kprobe-tracer where register
names start with '%', it is insufficient for parsing register numbers
directly from raw instructions (e.g., extracting '6' from 'x6' or 'w6')
during annotation.

Since get_dwarf_regnum() is currently used only by 'perf annotate' and
does not affect kprobe-tracer, replace the limited table-based lookup
with a programmatic implementation in __get_dwarf_regnum_arm64(). This
allows resolving arm64 register names (x0-x30, w0-w30, sp, etc.) directly
into their corresponding DWARF register numbers.

Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
---
 .../util/dwarf-regs-arch/dwarf-regs-arm64.c   | 20 +++++++++++++++++++
 tools/perf/util/dwarf-regs.c                  |  2 +-
 tools/perf/util/include/dwarf-regs.h          |  1 +
 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/dwarf-regs-arch/dwarf-regs-arm64.c b/tools/perf/util/dwarf-regs-arch/dwarf-regs-arm64.c
index 593ca7d4fccc..be55fc2a4f38 100644
--- a/tools/perf/util/dwarf-regs-arch/dwarf-regs-arm64.c
+++ b/tools/perf/util/dwarf-regs-arch/dwarf-regs-arm64.c
@@ -1,5 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <errno.h>
+#include <ctype.h>
 #include <dwarf-regs.h>
 #include "../../../arch/arm64/include/uapi/asm/perf_regs.h"
 
@@ -10,3 +11,22 @@ int __get_dwarf_regnum_for_perf_regnum_arm64(int perf_regnum)
 
 	return perf_regnum;
 }
+
+int __get_dwarf_regnum_arm64(const char *name)
+{
+	int reg;
+
+	if (!strcmp(name, "sp") || !strcmp(name, "wzr") || !strcmp(name, "xzr"))
+		return 31;
+
+	if (*name != 'x' && *name != 'w')
+		return -ENOENT;
+
+	name++;
+	if (!isdigit(*name))
+		return -ENOENT;
+
+	reg = strtol(name, NULL, 10);
+
+	return reg >= 0 && reg <= 30 ? reg : -ENOENT;
+}
diff --git a/tools/perf/util/dwarf-regs.c b/tools/perf/util/dwarf-regs.c
index 797f455eba0d..bacf5c13c3bc 100644
--- a/tools/perf/util/dwarf-regs.c
+++ b/tools/perf/util/dwarf-regs.c
@@ -114,7 +114,7 @@ int get_dwarf_regnum(const char *name, unsigned int machine, unsigned int flags)
 		reg = _get_dwarf_regnum(arm_regstr_tbl, name);
 		break;
 	case EM_AARCH64:
-		reg = _get_dwarf_regnum(aarch64_regstr_tbl, name);
+		reg = __get_dwarf_regnum_arm64(name);
 		break;
 	case EM_CSKY:
 		reg = __get_csky_regnum(name, flags);
diff --git a/tools/perf/util/include/dwarf-regs.h b/tools/perf/util/include/dwarf-regs.h
index 46a764cf322f..a25f038bbff2 100644
--- a/tools/perf/util/include/dwarf-regs.h
+++ b/tools/perf/util/include/dwarf-regs.h
@@ -105,6 +105,7 @@ int __get_dwarf_regnum_x86_64(const char *name);
 int __get_dwarf_regnum_for_perf_regnum_i386(int perf_regnum);
 int __get_dwarf_regnum_for_perf_regnum_x86_64(int perf_regnum);
 
+int __get_dwarf_regnum_arm64(const char *name);
 int __get_dwarf_regnum_for_perf_regnum_arm(int perf_regnum);
 int __get_dwarf_regnum_for_perf_regnum_arm64(int perf_regnum);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 07/21] perf annotate: Adapt arch__dwarf_regnum() for arm64
  2026-07-01  3:53 [PATCH v3 00/21] perf arm64: Support data type profiling Tengda Wu
                   ` (5 preceding siblings ...)
  2026-07-01  3:53 ` [PATCH v3 06/21] perf dwarf-regs: Adapt get_dwarf_regnum() for arm64 Tengda Wu
@ 2026-07-01  3:53 ` Tengda Wu
  2026-07-01  3:53 ` [PATCH v3 08/21] perf annotate: Introduce extract_op_location callback for arch-specific parsing Tengda Wu
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 52+ messages in thread
From: Tengda Wu @ 2026-07-01  3:53 UTC (permalink / raw)
  To: Namhyung Kim, james.clark, xueshuai, Li Huafei
  Cc: Peter Zijlstra, leo.yan, Ian Rogers, Kim Phillips, Mark Rutland,
	Arnaldo Carvalho de Melo, Ingo Molnar, Bill Wendling,
	Nick Desaulniers, Alexander Shishkin, Adrian Hunter, Zecheng Li,
	linux-perf-users, linux-kernel, llvm, Tengda Wu

Currently, arch__dwarf_regnum() assumes that all architectures use a
register prefix character (e.g., '%' for x86) defined by
arch->objdump.register_char, and uses it to match register names in
objdump output. However, this assumption does not hold for arm64,
where assembly syntax uses bare register names like 'x0', 'w1'
without any prefix.

As a result, arm64 builds may fail to correctly recognize register
names from objdump disassembly, leading to incomplete or incorrect
annotation output.

To address this:

- Make the register prefix check optional, allowing architectures
  without a prefix character to be parsed correctly.

- Extend the delimiter set in strpbrk() to include the closing square
  bracket ']'. In arm64 assembly, memory operands often use bracketed
  syntax such as '[x1, #16]' or '[x2]'. Adding ']' ensures clean
  extraction of register names like 'x2' without trailing characters.

- Remove the 'static' qualifier from arch__dwarf_regnum() so that it
  can be reused by other architecture-specific profiling components
  in future changes.

Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
---
 tools/perf/util/annotate.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 53b2a224b21d..ecd367582315 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -2449,21 +2449,23 @@ int annotate_check_args(void)
 	return 0;
 }
 
-static int arch__dwarf_regnum(const struct arch *arch, const char *str)
+int arch__dwarf_regnum(const struct arch *arch, const char *str)
 {
-	const char *p;
+	const char *p = str;
 	char *regname, *q;
 	int reg;
 
-	p = strchr(str, arch->objdump.register_char);
-	if (p == NULL)
-		return -1;
+	if (arch->objdump.register_char) {
+		p = strchr(str, arch->objdump.register_char);
+		if (p == NULL)
+			return -1;
+	}
 
 	regname = strdup(p);
 	if (regname == NULL)
 		return -1;
 
-	q = strpbrk(regname, ",) ");
+	q = strpbrk(regname, ",)] ");
 	if (q)
 		*q = '\0';
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 08/21] perf annotate: Introduce extract_op_location callback for arch-specific parsing
  2026-07-01  3:53 [PATCH v3 00/21] perf arm64: Support data type profiling Tengda Wu
                   ` (6 preceding siblings ...)
  2026-07-01  3:53 ` [PATCH v3 07/21] perf annotate: Adapt arch__dwarf_regnum() " Tengda Wu
@ 2026-07-01  3:53 ` Tengda Wu
  2026-07-01  4:06   ` sashiko-bot
  2026-07-01  3:53 ` [PATCH v3 09/21] perf annotate-arm64: Implement extract_op_location() callback Tengda Wu
                   ` (12 subsequent siblings)
  20 siblings, 1 reply; 52+ messages in thread
From: Tengda Wu @ 2026-07-01  3:53 UTC (permalink / raw)
  To: Namhyung Kim, james.clark, xueshuai, Li Huafei
  Cc: Peter Zijlstra, leo.yan, Ian Rogers, Kim Phillips, Mark Rutland,
	Arnaldo Carvalho de Melo, Ingo Molnar, Bill Wendling,
	Nick Desaulniers, Alexander Shishkin, Adrian Hunter, Zecheng Li,
	linux-perf-users, linux-kernel, llvm, Tengda Wu

Assembly syntax for operands varies significantly across different
architectures, which prevents the operand location (op_loc) parsing
logic in annotate_get_insn_location() from being directly reused.

To simplify the core logic and improve maintainability, move the
operand parsing inside the for_each_insn_op_loc loop into arch-specific
extract_op_location callbacks. This refactoring is intended to be a
cleanup with no functional changes.

Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
---
 .../util/annotate-arch/annotate-powerpc.c     | 10 ++
 tools/perf/util/annotate-arch/annotate-x86.c  | 82 ++++++++++++++++
 tools/perf/util/annotate.c                    | 94 ++-----------------
 tools/perf/util/annotate.h                    |  2 +
 tools/perf/util/disasm.h                      |  4 +
 5 files changed, 105 insertions(+), 87 deletions(-)

diff --git a/tools/perf/util/annotate-arch/annotate-powerpc.c b/tools/perf/util/annotate-arch/annotate-powerpc.c
index 218207b52581..8d0b8def5955 100644
--- a/tools/perf/util/annotate-arch/annotate-powerpc.c
+++ b/tools/perf/util/annotate-arch/annotate-powerpc.c
@@ -390,6 +390,15 @@ static void update_insn_state_powerpc(struct type_state *state,
 }
 #endif /* HAVE_LIBDW_SUPPORT */
 
+static int extract_op_location_powerpc(const struct arch *arch __maybe_unused,
+				       struct disasm_line *dl,
+				       const char *op_str __maybe_unused, int op_idx,
+				       struct annotated_op_loc *op_loc)
+{
+	get_powerpc_regs(dl->raw.raw_insn, !op_idx, op_loc);
+	return 0;
+}
+
 const struct arch *arch__new_powerpc(const struct e_machine_and_e_flags *id,
 				     const char *cpuid __maybe_unused)
 {
@@ -406,5 +415,6 @@ const struct arch *arch__new_powerpc(const struct e_machine_and_e_flags *id,
 #ifdef HAVE_LIBDW_SUPPORT
 	arch->update_insn_state = update_insn_state_powerpc;
 #endif
+	arch->extract_op_location = extract_op_location_powerpc;
 	return arch;
 }
diff --git a/tools/perf/util/annotate-arch/annotate-x86.c b/tools/perf/util/annotate-arch/annotate-x86.c
index 7e6136536393..6a4b4bdcd7a7 100644
--- a/tools/perf/util/annotate-arch/annotate-x86.c
+++ b/tools/perf/util/annotate-arch/annotate-x86.c
@@ -2,6 +2,7 @@
 #include <string.h>
 #include <linux/compiler.h>
 #include <linux/zalloc.h>
+#include <linux/ctype.h>
 #include <assert.h>
 #include <inttypes.h>
 #include "../annotate-data.h"
@@ -809,6 +810,86 @@ static void update_insn_state_x86(struct type_state *state,
 }
 #endif
 
+/*
+ * Get register number and access offset from the given instruction.
+ * It assumes AT&T x86 asm format like OFFSET(REG).  Maybe it needs
+ * to revisit the format when it handles different architecture.
+ * Fills @reg and @offset when return 0.
+ */
+static int extract_reg_offset(const struct arch *arch, const char *str,
+			      struct annotated_op_loc *op_loc)
+{
+	char *p;
+
+	if (arch->objdump.register_char == 0)
+		return -1;
+
+	/*
+	 * It should start from offset, but it's possible to skip 0
+	 * in the asm.  So 0(%rax) should be same as (%rax).
+	 *
+	 * However, it also start with a segment select register like
+	 * %gs:0x18(%rbx).  In that case it should skip the part.
+	 */
+	if (*str == arch->objdump.register_char) {
+		/* FIXME: Handle other segment registers */
+		if (!strncmp(str, "%gs:", 4))
+			op_loc->segment = INSN_SEG_X86_GS;
+
+		while (*str && !isdigit(*str) &&
+		       *str != arch->objdump.memory_ref_char)
+			str++;
+	}
+
+	op_loc->offset = strtol(str, &p, 0);
+	op_loc->reg1 = arch__dwarf_regnum(arch, p);
+	if (op_loc->reg1 == -1)
+		return -1;
+
+	/* Get the second register */
+	if (op_loc->multi_regs)
+		op_loc->reg2 = arch__dwarf_regnum(arch, p + 1);
+
+	return 0;
+}
+
+static int extract_op_location_x86(const struct arch *arch,
+				   struct disasm_line *dl __maybe_unused,
+				   const char *op_str, int op_idx __maybe_unused,
+				   struct annotated_op_loc *op_loc)
+{
+	const char *s = op_str;
+	char *p = NULL;
+
+	if (op_str == NULL)
+		return 0;
+
+	if (strchr(op_str, arch->objdump.memory_ref_char)) {
+		op_loc->mem_ref = true;
+		return extract_reg_offset(arch, op_str, op_loc);
+	}
+
+	/* FIXME: Handle other segment registers */
+	if (!strncmp(op_str, "%gs:", 4)) {
+		op_loc->segment = INSN_SEG_X86_GS;
+		op_loc->offset = strtol(op_str + 4,
+					&p, 0);
+		if (p && p != op_str + 4)
+			op_loc->imm = true;
+		return 0;
+	}
+
+	if (*s == arch->objdump.register_char) {
+		op_loc->reg1 = arch__dwarf_regnum(arch, s);
+	} else if (*s == arch->objdump.imm_char) {
+		op_loc->offset = strtol(s + 1, &p, 0);
+		if (p && p != s + 1)
+			op_loc->imm = true;
+	}
+
+	return 0;
+}
+
 const struct arch *arch__new_x86(const struct e_machine_and_e_flags *id, const char *cpuid)
 {
 	struct arch *arch = zalloc(sizeof(*arch));
@@ -848,5 +929,6 @@ const struct arch *arch__new_x86(const struct e_machine_and_e_flags *id, const c
 #ifdef HAVE_LIBDW_SUPPORT
 	arch->update_insn_state = update_insn_state_x86;
 #endif
+	arch->extract_op_location = extract_op_location_x86;
 	return arch;
 }
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index ecd367582315..7d5cf57a110d 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -2474,51 +2474,6 @@ int arch__dwarf_regnum(const struct arch *arch, const char *str)
 	return reg;
 }
 
-/*
- * Get register number and access offset from the given instruction.
- * It assumes AT&T x86 asm format like OFFSET(REG).  Maybe it needs
- * to revisit the format when it handles different architecture.
- * Fills @reg and @offset when return 0.
- */
-static int extract_reg_offset(const struct arch *arch, const char *str,
-			      struct annotated_op_loc *op_loc)
-{
-	char *p;
-
-	if (arch->objdump.register_char == 0)
-		return -1;
-
-	/*
-	 * It should start from offset, but it's possible to skip 0
-	 * in the asm.  So 0(%rax) should be same as (%rax).
-	 *
-	 * However, it also start with a segment select register like
-	 * %gs:0x18(%rbx).  In that case it should skip the part.
-	 */
-	if (*str == arch->objdump.register_char) {
-		if (arch__is_x86(arch)) {
-			/* FIXME: Handle other segment registers */
-			if (!strncmp(str, "%gs:", 4))
-				op_loc->segment = INSN_SEG_X86_GS;
-		}
-
-		while (*str && !isdigit(*str) &&
-		       *str != arch->objdump.memory_ref_char)
-			str++;
-	}
-
-	op_loc->offset = strtol(str, &p, 0);
-	op_loc->reg1 = arch__dwarf_regnum(arch, p);
-	if (op_loc->reg1 == -1)
-		return -1;
-
-	/* Get the second register */
-	if (op_loc->multi_regs)
-		op_loc->reg2 = arch__dwarf_regnum(arch, p + 1);
-
-	return 0;
-}
-
 /**
  * annotate_get_insn_location - Get location of instruction
  * @arch: the architecture info
@@ -2549,6 +2504,7 @@ int annotate_get_insn_location(const struct arch *arch, struct disasm_line *dl,
 	struct ins_operands *ops;
 	struct annotated_op_loc *op_loc;
 	int i;
+	int ret;
 
 	if (ins__is_lock(&dl->ins))
 		ops = dl->ops.locked.ops;
@@ -2574,49 +2530,13 @@ int annotate_get_insn_location(const struct arch *arch, struct disasm_line *dl,
 		/* Invalidate the register by default */
 		op_loc->reg1 = -1;
 		op_loc->reg2 = -1;
+		op_loc->mem_ref = mem_ref;
+		op_loc->multi_regs = multi_regs;
 
-		if (insn_str == NULL) {
-			if (!arch__is_powerpc(arch))
-				continue;
-		}
-
-		/*
-		 * For powerpc, call get_powerpc_regs function which extracts the
-		 * required fields for op_loc, ie reg1, reg2, offset from the
-		 * raw instruction.
-		 */
-		if (arch__is_powerpc(arch)) {
-			op_loc->mem_ref = mem_ref;
-			op_loc->multi_regs = multi_regs;
-			get_powerpc_regs(dl->raw.raw_insn, !i, op_loc);
-		} else if (strchr(insn_str, arch->objdump.memory_ref_char)) {
-			op_loc->mem_ref = true;
-			op_loc->multi_regs = multi_regs;
-			extract_reg_offset(arch, insn_str, op_loc);
-		} else {
-			const char *s = insn_str;
-			char *p = NULL;
-
-			if (arch__is_x86(arch)) {
-				/* FIXME: Handle other segment registers */
-				if (!strncmp(insn_str, "%gs:", 4)) {
-					op_loc->segment = INSN_SEG_X86_GS;
-					op_loc->offset = strtol(insn_str + 4,
-								&p, 0);
-					if (p && p != insn_str + 4)
-						op_loc->imm = true;
-					continue;
-				}
-			}
-
-			if (*s == arch->objdump.register_char) {
-				op_loc->reg1 = arch__dwarf_regnum(arch, s);
-			}
-			else if (*s == arch->objdump.imm_char) {
-				op_loc->offset = strtol(s + 1, &p, 0);
-				if (p && p != s + 1)
-					op_loc->imm = true;
-			}
+		if (arch->extract_op_location) {
+			ret = arch->extract_op_location(arch, dl, insn_str, i, op_loc);
+			if (ret)
+				return ret;
 		}
 	}
 
diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
index 1aa6df7d1618..5a0a913af53a 100644
--- a/tools/perf/util/annotate.h
+++ b/tools/perf/util/annotate.h
@@ -492,6 +492,8 @@ int annotate_parse_percent_type(const struct option *opt, const char *_str,
 
 int annotate_check_args(void);
 
+int arch__dwarf_regnum(const struct arch *arch, const char *str);
+
 /**
  * struct annotated_op_loc - Location info of instruction operand
  * @reg1: First register in the operand
diff --git a/tools/perf/util/disasm.h b/tools/perf/util/disasm.h
index 06c664fd4fc2..8796f92769f9 100644
--- a/tools/perf/util/disasm.h
+++ b/tools/perf/util/disasm.h
@@ -16,6 +16,7 @@ struct symbol;
 struct data_loc_info;
 struct type_state;
 struct disasm_line;
+struct annotated_op_loc;
 
 struct e_machine_and_e_flags {
 	uint32_t e_flags;
@@ -49,6 +50,9 @@ struct arch {
 				struct data_loc_info *dloc, Dwarf_Die *cu_die,
 				struct disasm_line *dl);
 #endif
+	int		(*extract_op_location)(const struct arch *arch, struct disasm_line *dl,
+					       const char *op_str, int op_idx,
+					       struct annotated_op_loc *op_loc);
 };
 
 struct ins {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 09/21] perf annotate-arm64: Implement extract_op_location() callback
  2026-07-01  3:53 [PATCH v3 00/21] perf arm64: Support data type profiling Tengda Wu
                   ` (7 preceding siblings ...)
  2026-07-01  3:53 ` [PATCH v3 08/21] perf annotate: Introduce extract_op_location callback for arch-specific parsing Tengda Wu
@ 2026-07-01  3:53 ` Tengda Wu
  2026-07-01  4:10   ` sashiko-bot
  2026-07-01  3:53 ` [PATCH v3 10/21] perf annotate: Deduplicate overlapping ARM SPE events for data type profiling Tengda Wu
                   ` (11 subsequent siblings)
  20 siblings, 1 reply; 52+ messages in thread
From: Tengda Wu @ 2026-07-01  3:53 UTC (permalink / raw)
  To: Namhyung Kim, james.clark, xueshuai, Li Huafei
  Cc: Peter Zijlstra, leo.yan, Ian Rogers, Kim Phillips, Mark Rutland,
	Arnaldo Carvalho de Melo, Ingo Molnar, Bill Wendling,
	Nick Desaulniers, Alexander Shishkin, Adrian Hunter, Zecheng Li,
	linux-perf-users, linux-kernel, llvm, Tengda Wu

Implement the extract_op_location() callback for the arm64 architecture
to handle its specific assembly syntax and addressing modes.

The extractor handles:
1. Standalone immediate operands (e.g., #0x10).
2. Memory references with diverse addressing modes:
   - Signed offset: [base, #imm]
   - Pre-index: [base, #imm]!
   - Post-index: [base], #imm
3. Multi-register operands and primary/secondary register extraction.

With this callback in place, 'perf annotate' can now resolve memory
locations and register types on arm64, marking the initial enablement
of data type profiling for this architecture.

Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
---
 .../perf/util/annotate-arch/annotate-arm64.c  | 64 +++++++++++++++++++
 tools/perf/util/annotate.h                    | 10 +++
 2 files changed, 74 insertions(+)

diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
index 451500def8c1..77b6f4841c84 100644
--- a/tools/perf/util/annotate-arch/annotate-arm64.c
+++ b/tools/perf/util/annotate-arch/annotate-arm64.c
@@ -261,6 +261,69 @@ static const struct ins_ops *arm64__associate_instruction_ops(struct arch *arch,
 	return ops;
 }
 
+static int extract_op_location_arm64(const struct arch *arch,
+				     struct disasm_line *dl __maybe_unused,
+				     const char *op_str, int op_idx __maybe_unused,
+				     struct annotated_op_loc *op_loc)
+{
+	const char *s = op_str;
+	char *p = NULL;
+
+	if (op_str == NULL)
+		return 0;
+
+	/* Handle standalone immediate operands (e.g., #0x10) */
+	if (*s == arch->objdump.imm_char) {
+		op_loc->offset = strtol(s + 1, &p, 0);
+		if (p && p != s + 1)
+			op_loc->imm = true;
+		return 0;
+	}
+
+	/*
+	 * Handle memory references (e.g., [x0, #8]), identify
+	 * arm64 specific addressing modes
+	 */
+	if (*s == arch->objdump.memory_ref_char) {
+		op_loc->mem_ref = true;
+
+		p = strchr(s, ']');
+		if (p == NULL)
+			return -1;
+
+		/* Pre-index: [base, #imm]! */
+		if (p[1] == '!')
+			op_loc->addr_mode = PERF_ADDR_MODE_PRE_INDEX;
+		/* Post-index: [base], #imm */
+		else if (p[1] == ',' && strchr(p + 1, arch->objdump.imm_char))
+			op_loc->addr_mode = PERF_ADDR_MODE_POST_INDEX;
+		/* Signed offset: [base{, #imm}] */
+		else
+			op_loc->addr_mode = PERF_ADDR_MODE_SIGNED_OFFSET;
+
+		s++;
+	}
+
+	/* Extract the primary register */
+	op_loc->reg1 = arch__dwarf_regnum(arch, s);
+	if (op_loc->reg1 == -1)
+		return -1;
+
+	/* Move to the next symbol of the operand, if any */
+	s = strchr(s, ',');
+	if (s == NULL)
+		return 0;
+	s = skip_spaces(s + 1);
+
+	/* Parse secondary register or immediate offset */
+	if (op_loc->multi_regs)
+		op_loc->reg2 = arch__dwarf_regnum(arch, s);
+	else if (*s == arch->objdump.imm_char)
+		op_loc->offset = strtol(s + 1, &p, 0);
+
+	return 0;
+}
+
 const struct arch *arch__new_arm64(const struct e_machine_and_e_flags *id,
 				   const char *cpuid __maybe_unused)
 {
@@ -279,6 +342,7 @@ const struct arch *arch__new_arm64(const struct e_machine_and_e_flags *id,
 	arch->objdump.memory_ref_char	  = '[';
 	arch->objdump.imm_char		  = '#';
 	arch->associate_instruction_ops   = arm64__associate_instruction_ops;
+	arch->extract_op_location	  = extract_op_location_arm64;
 
 	/* bl, blr */
 	err = regcomp(&arm->call_insn, "^blr?$", REG_EXTENDED);
diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
index 5a0a913af53a..670dc7944628 100644
--- a/tools/perf/util/annotate.h
+++ b/tools/perf/util/annotate.h
@@ -494,12 +494,21 @@ int annotate_check_args(void);
 
 int arch__dwarf_regnum(const struct arch *arch, const char *str);
 
+enum annotated_addr_mode {
+	PERF_ADDR_MODE_NONE = 0,
+
+	PERF_ADDR_MODE_SIGNED_OFFSET,
+	PERF_ADDR_MODE_PRE_INDEX,
+	PERF_ADDR_MODE_POST_INDEX,
+};
+
 /**
  * struct annotated_op_loc - Location info of instruction operand
  * @reg1: First register in the operand
  * @reg2: Second register in the operand
  * @offset: Memory access offset in the operand
  * @segment: Segment selector register
+ * @addr_mode: Addressing mode, only valid if @mem_ref is true
  * @mem_ref: Whether the operand accesses memory
  * @multi_regs: Whether the second register is used
  * @imm: Whether the operand is an immediate value (in offset)
@@ -509,6 +518,7 @@ struct annotated_op_loc {
 	int reg2;
 	int offset;
 	u8 segment;
+	u8 addr_mode;
 	bool mem_ref;
 	bool multi_regs;
 	bool imm;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 10/21] perf annotate: Deduplicate overlapping ARM SPE events for data type profiling
  2026-07-01  3:53 [PATCH v3 00/21] perf arm64: Support data type profiling Tengda Wu
                   ` (8 preceding siblings ...)
  2026-07-01  3:53 ` [PATCH v3 09/21] perf annotate-arm64: Implement extract_op_location() callback Tengda Wu
@ 2026-07-01  3:53 ` Tengda Wu
  2026-07-01  4:06   ` sashiko-bot
  2026-07-01  3:53 ` [PATCH v3 11/21] perf auxtrace: Set default period to 1 for PERF_ITRACE_PERIOD_INSTRUCTIONS type Tengda Wu
                   ` (10 subsequent siblings)
  20 siblings, 1 reply; 52+ messages in thread
From: Tengda Wu @ 2026-07-01  3:53 UTC (permalink / raw)
  To: Namhyung Kim, james.clark, xueshuai, Li Huafei
  Cc: Peter Zijlstra, leo.yan, Ian Rogers, Kim Phillips, Mark Rutland,
	Arnaldo Carvalho de Melo, Ingo Molnar, Bill Wendling,
	Nick Desaulniers, Alexander Shishkin, Adrian Hunter, Zecheng Li,
	linux-perf-users, linux-kernel, llvm, Tengda Wu

When data type profiling is enabled on ARM SPE, multiple overlapping
events (e.g., l1d-miss, tlb-access) are synthesized for a single sampled
instruction, as shown below:

  Available samples
  0 arm_spe_0/ts_enable=1,pa_enable=1,load_filter=1,store_filter=1,min_latency=30/
  0 dummy:u
  84K l1d-miss
  95K l1d-access
  77K llc-miss
  58K llc-access
  9K tlb-miss
  108K tlb-access
  0 branch
  13K remote-access
  108K memory
  108K instructions

While 'perf report' provides an interactive menu for users to select a
specific event to prevent duplicate counting, 'perf annotate' lacks such
a mechanism. Consequently, it counts all instructions across these
overlapping events, which inflates the profile and distorts the data
type statistics.

Although using the '--itrace' option can work around this issue, for example:

  # perf annotate --data-type --stdio --itrace=i1i

it is inconvenient for users to specify this explicitly every time.

To address this, implement automatic deduplication in process_sample_event()
by retaining only the comprehensive "instructions" event when data type
annotation is enabled and no explicit itrace options are configured.

To achieve this, also set 'event->auxtrace_info.type' in arm_spe_prep_sample()
so that 'perf annotate' can correctly identify ARM SPE events during
sample processing.

Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
---
 tools/perf/builtin-annotate.c | 13 +++++++++++++
 tools/perf/util/arm-spe.c     |  1 +
 2 files changed, 14 insertions(+)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index 69cb72b2082a..46b2bd3fb201 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -283,9 +283,22 @@ static int process_sample_event(const struct perf_tool *tool,
 				struct machine *machine)
 {
 	struct perf_annotate *ann = container_of(tool, struct perf_annotate, tool);
+	struct evsel *evsel = sample->evsel;
 	struct addr_location al;
 	int ret = 0;
 
+	/*
+	 * ARM SPE synthesizes multiple overlapping events (e.g., l1d-miss,
+	 * tlb-access) for a single sampled instruction. When data type is
+	 * enabled, counting all of these overlapping events would inflate
+	 * the profile and distort the data type statistics. To avoid this,
+	 * deduplicate by retaining only the comprehensive "instructions" event.
+	 */
+	if (ann->data_type && !ann->session->itrace_synth_opts->set &&
+	    event->auxtrace_info.type == PERF_AUXTRACE_ARM_SPE &&
+	    strcmp(evsel->name, "instructions") != 0)
+		return 0;
+
 	addr_location__init(&al);
 	if (machine__resolve(machine, &al, sample) < 0) {
 		pr_warning("problem processing %s (%u) event at offset %#" PRIx64 ", skipping it.\n",
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 552f063f126e..e027e3b200bf 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -403,6 +403,7 @@ static void arm_spe_prep_sample(struct arm_spe *spe,
 	event->sample.header.type = PERF_RECORD_SAMPLE;
 	event->sample.header.misc = sample->cpumode;
 	event->sample.header.size = sizeof(struct perf_event_header);
+	event->auxtrace_info.type = PERF_AUXTRACE_ARM_SPE;
 }
 
 static void arm_spe__prep_branch_stack(struct arm_spe_queue *speq)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 11/21] perf auxtrace: Set default period to 1 for PERF_ITRACE_PERIOD_INSTRUCTIONS type
  2026-07-01  3:53 [PATCH v3 00/21] perf arm64: Support data type profiling Tengda Wu
                   ` (9 preceding siblings ...)
  2026-07-01  3:53 ` [PATCH v3 10/21] perf annotate: Deduplicate overlapping ARM SPE events for data type profiling Tengda Wu
@ 2026-07-01  3:53 ` Tengda Wu
  2026-07-01  4:05   ` sashiko-bot
  2026-07-01  3:53 ` [PATCH v3 12/21] perf annotate-data: Extract invalidate_reg_state() as a common helper Tengda Wu
                   ` (9 subsequent siblings)
  20 siblings, 1 reply; 52+ messages in thread
From: Tengda Wu @ 2026-07-01  3:53 UTC (permalink / raw)
  To: Namhyung Kim, james.clark, xueshuai, Li Huafei
  Cc: Peter Zijlstra, leo.yan, Ian Rogers, Kim Phillips, Mark Rutland,
	Arnaldo Carvalho de Melo, Ingo Molnar, Bill Wendling,
	Nick Desaulniers, Alexander Shishkin, Adrian Hunter, Zecheng Li,
	linux-perf-users, linux-kernel, llvm, Tengda Wu

When using --itrace=M for data type profiling on arm64, the 'Percent'
values in perf annotate output are all zero:

Annotate type: 'struct mmu_gather_batch' in [kernel.kallsyms] (15 samples):
============================================================================
 Percent     offset       size  field
    0.00          0       0x10  struct mmu_gather_batch  {
    0.00          0        0x8      struct mmu_gather_batch*    next;
    0.00        0x8        0x4      unsigned int        nr;
    0.00        0xc        0x4      unsigned int        max;
    0.00       0x10          0      struct encoded_page*[]      encoded_pages;
                                };

However, adding the -n option reveals non-zero sample counts:

Annotate type: 'struct mmu_gather_batch' in [kernel.kallsyms] (15 samples):
============================================================================
 Samples     offset       size  field
      15          0       0x10  struct mmu_gather_batch  {
      13          0        0x8      struct mmu_gather_batch*    next;
       2        0x8        0x4      unsigned int        nr;
       0        0xc        0x4      unsigned int        max;
       0       0x10          0      struct encoded_page*[]      encoded_pages;
                                };

The root cause is that when period is not explicitly specified in the
--itrace option, it remains zero after itrace_do_parse_synth_opts().
The zero period then flows through annotated_data_type__update_samples()
where h->period accumulates to zero, and print_annotated_data_type_value()
calculates the 'Percent' as zero.

In itrace_do_parse_synth_opts(), non-'iy' options have their period
type set to PERF_ITRACE_PERIOD_INSTRUCTIONS, but period remains zero.
Since a zero period is meaningless for this type, default to 1 (one
sample per instruction).

With this fix applied, the result is as follows:

Annotate type: 'struct mmu_gather_batch' in [kernel.kallsyms] (15 samples):
============================================================================
 Percent     offset       size  field
  100.00          0       0x10  struct mmu_gather_batch  {
   86.67          0        0x8      struct mmu_gather_batch*    next;
   13.33        0x8        0x4      unsigned int        nr;
    0.00        0xc        0x4      unsigned int        max;
    0.00       0x10          0      struct encoded_page*[]      encoded_pages;
                                };

Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
---
 tools/perf/util/auxtrace.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index 0b851f32e98c..415b68a2bba9 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -1759,6 +1759,12 @@ int itrace_do_parse_synth_opts(struct itrace_synth_opts *synth_opts,
 			synth_opts->period = PERF_ITRACE_DEFAULT_PERIOD;
 	}
 
+	if (!period_set &&
+	    synth_opts->period_type == PERF_ITRACE_PERIOD_INSTRUCTIONS) {
+		/* Indicates a sample is taken for every instruction. */
+		synth_opts->period = 1;
+	}
+
 	return 0;
 
 out_err:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 12/21] perf annotate-data: Extract invalidate_reg_state() as a common helper
  2026-07-01  3:53 [PATCH v3 00/21] perf arm64: Support data type profiling Tengda Wu
                   ` (10 preceding siblings ...)
  2026-07-01  3:53 ` [PATCH v3 11/21] perf auxtrace: Set default period to 1 for PERF_ITRACE_PERIOD_INSTRUCTIONS type Tengda Wu
@ 2026-07-01  3:53 ` Tengda Wu
  2026-07-01  3:53 ` [PATCH v3 13/21] perf annotate-arm64: Enable instruction tracking support Tengda Wu
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 52+ messages in thread
From: Tengda Wu @ 2026-07-01  3:53 UTC (permalink / raw)
  To: Namhyung Kim, james.clark, xueshuai, Li Huafei
  Cc: Peter Zijlstra, leo.yan, Ian Rogers, Kim Phillips, Mark Rutland,
	Arnaldo Carvalho de Melo, Ingo Molnar, Bill Wendling,
	Nick Desaulniers, Alexander Shishkin, Adrian Hunter, Zecheng Li,
	linux-perf-users, linux-kernel, llvm, Tengda Wu

The invalidate_reg_state() function is currently static and only used
within the x86 data type profiling implementation. To prepare for adding
arm64 instruction tracking support, extract it as a common helper that
can be shared across architectures.

No functional change intended.

Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
---
 tools/perf/util/annotate-arch/annotate-x86.c | 9 ---------
 tools/perf/util/annotate-data.c              | 9 +++++++++
 tools/perf/util/annotate-data.h              | 1 +
 3 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/tools/perf/util/annotate-arch/annotate-x86.c b/tools/perf/util/annotate-arch/annotate-x86.c
index 6a4b4bdcd7a7..ef84b9593a39 100644
--- a/tools/perf/util/annotate-arch/annotate-x86.c
+++ b/tools/perf/util/annotate-arch/annotate-x86.c
@@ -206,15 +206,6 @@ static int x86__cpuid_parse(struct arch *arch, const char *cpuid)
 }
 
 #ifdef HAVE_LIBDW_SUPPORT
-static void invalidate_reg_state(struct type_state_reg *reg)
-{
-	reg->kind = TSR_KIND_INVALID;
-	reg->ok = false;
-	reg->lifetime_active = false;
-	reg->lifetime_end = 0;
-	reg->copied_from = -1;
-}
-
 static void update_insn_state_x86(struct type_state *state,
 				  struct data_loc_info *dloc, Dwarf_Die *cu_die,
 				  struct disasm_line *dl)
diff --git a/tools/perf/util/annotate-data.c b/tools/perf/util/annotate-data.c
index 4e4c58764082..104b80d471f1 100644
--- a/tools/perf/util/annotate-data.c
+++ b/tools/perf/util/annotate-data.c
@@ -638,6 +638,15 @@ struct type_state_stack *findnew_stack_state(struct type_state *state,
 	return stack;
 }
 
+void invalidate_reg_state(struct type_state_reg *reg)
+{
+	reg->kind = TSR_KIND_INVALID;
+	reg->ok = false;
+	reg->lifetime_active = false;
+	reg->lifetime_end = 0;
+	reg->copied_from = -1;
+}
+
 /* Maintain a cache for quick global variable lookup */
 struct global_var_entry {
 	struct rb_node node;
diff --git a/tools/perf/util/annotate-data.h b/tools/perf/util/annotate-data.h
index c26130744260..453e13bbe3e2 100644
--- a/tools/perf/util/annotate-data.h
+++ b/tools/perf/util/annotate-data.h
@@ -258,6 +258,7 @@ void set_stack_state(struct type_state_stack *stack, int offset, u8 kind,
 				Dwarf_Die *type_die, int ptr_offset);
 struct type_state_stack *find_stack_state(struct type_state *state,
 						int offset);
+void invalidate_reg_state(struct type_state_reg *reg);
 bool get_global_var_type(Dwarf_Die *cu_die, struct data_loc_info *dloc,
 				u64 ip, u64 var_addr, int *var_offset,
 				Dwarf_Die *type_die);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 13/21] perf annotate-arm64: Enable instruction tracking support
  2026-07-01  3:53 [PATCH v3 00/21] perf arm64: Support data type profiling Tengda Wu
                   ` (11 preceding siblings ...)
  2026-07-01  3:53 ` [PATCH v3 12/21] perf annotate-data: Extract invalidate_reg_state() as a common helper Tengda Wu
@ 2026-07-01  3:53 ` Tengda Wu
  2026-07-01  4:12   ` sashiko-bot
  2026-07-01  3:53 ` [PATCH v3 14/21] perf annotate-arm64: Support load instruction tracking Tengda Wu
                   ` (7 subsequent siblings)
  20 siblings, 1 reply; 52+ messages in thread
From: Tengda Wu @ 2026-07-01  3:53 UTC (permalink / raw)
  To: Namhyung Kim, james.clark, xueshuai, Li Huafei
  Cc: Peter Zijlstra, leo.yan, Ian Rogers, Kim Phillips, Mark Rutland,
	Arnaldo Carvalho de Melo, Ingo Molnar, Bill Wendling,
	Nick Desaulniers, Alexander Shishkin, Adrian Hunter, Zecheng Li,
	linux-perf-users, linux-kernel, llvm, Tengda Wu

Enable basic instruction tracking for arm64 by implementing three
essential functions in the find_data_type_block() call path:

  find_data_type_block
    -> arch_supports_insn_tracking  (1)
    -> find_data_type_insn
         -> init_type_state         (2)
         -> update_insn_state       (3)

Changes:

* arch_supports_insn_tracking(): add arm64 to the list of supported
  architectures, allowing find_data_type_block() to proceed with data
  type analysis.

* init_type_state(): correctly identify ret_reg, stack_reg, and
  caller-saved registers for arm64 during type state initialization.

* update_insn_state(): add the update_insn_state_arm64() callback
  with no instruction support yet. More instructions will be enabled
  gradually in later patches.

With these changes, arm64 gains support for basic variable type
inference during instruction tracking.

Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
---
 .../perf/util/annotate-arch/annotate-arm64.c  | 30 +++++++++++++++++++
 tools/perf/util/annotate-data.c               | 11 ++++++-
 2 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
index 77b6f4841c84..48b1c94273eb 100644
--- a/tools/perf/util/annotate-arch/annotate-arm64.c
+++ b/tools/perf/util/annotate-arch/annotate-arm64.c
@@ -9,6 +9,8 @@
 #include <regex.h>
 #include "../annotate.h"
 #include "../disasm.h"
+#include "../annotate-data.h"
+#include "../debug.h"
 
 struct arch_arm64 {
 	struct arch arch;
@@ -324,6 +326,31 @@ static int extract_op_location_arm64(const struct arch *arch,
 	return 0;
 }
 
+#ifdef HAVE_LIBDW_SUPPORT
+static void update_insn_state_arm64(struct type_state *state,
+				    struct data_loc_info *dloc, Dwarf_Die *cu_die __maybe_unused,
+				    struct disasm_line *dl)
+{
+	struct annotated_insn_loc loc;
+	struct annotated_op_loc *dst = &loc.ops[INSN_OP_TARGET];
+	u32 insn_offset = dl->al.offset;
+
+	if (annotate_get_insn_location(dloc->arch, dl, &loc) < 0)
+		return;
+
+	/*
+	 * For unsupported instructions with a destination register, invalidate
+	 * the destination register itself to prevent incorrect type propagation.
+	 */
+	if (has_reg_type(state, dst->reg1)) {
+		pr_debug_dtp("%s [%x] invalidate reg%d\n",
+			     dl->ins.name, insn_offset, dst->reg1);
+		invalidate_reg_state(&state->regs[dst->reg1]);
+		return;
+	}
+}
+#endif
+
 const struct arch *arch__new_arm64(const struct e_machine_and_e_flags *id,
 				   const char *cpuid __maybe_unused)
 {
@@ -343,6 +370,9 @@ const struct arch *arch__new_arm64(const struct e_machine_and_e_flags *id,
 	arch->objdump.imm_char		  = '#';
 	arch->associate_instruction_ops   = arm64__associate_instruction_ops;
 	arch->extract_op_location	  = extract_op_location_arm64;
+#ifdef HAVE_LIBDW_SUPPORT
+	arch->update_insn_state		  = update_insn_state_arm64;
+#endif
 
 	/* bl, blr */
 	err = regcomp(&arm->call_insn, "^blr?$", REG_EXTENDED);
diff --git a/tools/perf/util/annotate-data.c b/tools/perf/util/annotate-data.c
index 104b80d471f1..c51b20a7af9b 100644
--- a/tools/perf/util/annotate-data.c
+++ b/tools/perf/util/annotate-data.c
@@ -28,6 +28,7 @@
 
 /* register number of the stack pointer */
 #define X86_REG_SP 7
+#define ARM64_REG_SP 31
 
 static void delete_var_types(struct die_var_type *var_types);
 
@@ -178,6 +179,13 @@ static void init_type_state(struct type_state *state, const struct arch *arch)
 		state->regs[11].caller_saved = true;
 		state->ret_reg = 0;
 		state->stack_reg = X86_REG_SP;
+	} else if (arch__is_arm64(arch)) {
+		int i;
+
+		for (i = 0; i < 18; i++)
+			state->regs[i].caller_saved = true;
+		state->ret_reg = 0;
+		state->stack_reg = ARM64_REG_SP;
 	}
 }
 
@@ -1437,7 +1445,8 @@ static enum type_match_result find_data_type_insn(struct data_loc_info *dloc,
 
 static int arch_supports_insn_tracking(struct data_loc_info *dloc)
 {
-	if ((arch__is_x86(dloc->arch)) || (arch__is_powerpc(dloc->arch)))
+	if (arch__is_x86(dloc->arch) || arch__is_powerpc(dloc->arch) ||
+	    arch__is_arm64(dloc->arch))
 		return 1;
 	return 0;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 14/21] perf annotate-arm64: Support load instruction tracking
  2026-07-01  3:53 [PATCH v3 00/21] perf arm64: Support data type profiling Tengda Wu
                   ` (12 preceding siblings ...)
  2026-07-01  3:53 ` [PATCH v3 13/21] perf annotate-arm64: Enable instruction tracking support Tengda Wu
@ 2026-07-01  3:53 ` Tengda Wu
  2026-07-01  4:14   ` sashiko-bot
  2026-07-01  3:53 ` [PATCH v3 15/21] perf annotate-arm64: Support store " Tengda Wu
                   ` (6 subsequent siblings)
  20 siblings, 1 reply; 52+ messages in thread
From: Tengda Wu @ 2026-07-01  3:53 UTC (permalink / raw)
  To: Namhyung Kim, james.clark, xueshuai, Li Huafei
  Cc: Peter Zijlstra, leo.yan, Ian Rogers, Kim Phillips, Mark Rutland,
	Arnaldo Carvalho de Melo, Ingo Molnar, Bill Wendling,
	Nick Desaulniers, Alexander Shishkin, Adrian Hunter, Zecheng Li,
	linux-perf-users, linux-kernel, llvm, Tengda Wu

Extend update_insn_state_arm64() to handle LDR instructions, tracking
register state changes when data is loaded from memory to registers.
This is essential for maintaining accurate type information when data
is moved from memory to registers.

The implementation handles the three primary arm64 addressing modes:
1. Signed offset: [base, #imm]
2. Pre-index: [base, #imm]!
3. Post-index: [base], #imm

Introduce adjust_reg_index_state() to handle the side effects of pre-index
and post-index addressing, where the base register is updated with the
offset after or before the memory access. This ensures that the register's
offset within a structure is correctly tracked across sequential
instructions.

A real-world example is shown below:

  ffff80008011f5b0 <pick_task_stop>:
  ffff80008011f5b8:  ldr  x0, [x0, #2712] // x0: struct rq* -> task_struct*
* ffff80008011f5c0:  ldr  w1, [x0, #104]

Before this commit, the type of x0 was incorrectly inferred as 'struct rq':

  find data type for 0x68(reg0) at pick_task_stop+0x10
  var [8] reg0 offset 0 type='struct rq*'
  chk [10] reg0 offset=0x68 ok=1 kind=1 (struct rq*) : Good!
  final result:  type='struct rq'

After this commit, the type of x0 is correctly inferred as 'struct task_struct':

  find data type for 0x68(reg0) at pick_task_stop+0x10
  var [8] reg0 offset 0 type='struct rq*'
  ldr [8] 0xa98(reg0) -> reg0 type='struct task_struct*'
  chk [10] reg0 offset=0x68 ok=1 kind=1 (struct task_struct*) : Good!
  final result: type='struct task_struct'

Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
---
 .../perf/util/annotate-arch/annotate-arm64.c  | 96 ++++++++++++++++++-
 1 file changed, 95 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
index 48b1c94273eb..b48545f35620 100644
--- a/tools/perf/util/annotate-arch/annotate-arm64.c
+++ b/tools/perf/util/annotate-arch/annotate-arm64.c
@@ -327,11 +327,98 @@ static int extract_op_location_arm64(const struct arch *arch,
 }
 
 #ifdef HAVE_LIBDW_SUPPORT
+static int get_reg_index_offset(struct annotated_op_loc *op_loc)
+{
+	return op_loc->addr_mode == PERF_ADDR_MODE_POST_INDEX ? 0 : op_loc->offset;
+}
+
+static void adjust_reg_index_state(struct type_state *state,
+				   struct annotated_op_loc *op_loc,
+				   const char *insn_name, u32 insn_offset)
+{
+	struct type_state_reg *tsr;
+	int reg = op_loc->reg1;
+
+	if (op_loc->addr_mode != PERF_ADDR_MODE_PRE_INDEX &&
+	    op_loc->addr_mode != PERF_ADDR_MODE_POST_INDEX)
+		return;
+
+	if (!has_reg_type(state, reg) || !state->regs[reg].ok)
+		return;
+
+	tsr = &state->regs[reg];
+	tsr->copied_from = -1;
+	tsr->offset = op_loc->offset + tsr->offset;
+
+	pr_debug_dtp("%s [%x] %s-index %#x(reg%d) -> reg%d", insn_name,
+		     insn_offset, op_loc->addr_mode == PERF_ADDR_MODE_PRE_INDEX ?
+		     "pre" : "post", op_loc->offset, reg, reg);
+	pr_debug_type_name(&tsr->type, tsr->kind);
+}
+
+static void update_load_insn_state(struct type_state *state,
+				   struct disasm_line *dl,
+				   struct annotated_op_loc *src,
+				   struct annotated_op_loc *dst)
+{
+	struct type_state_reg *tsr;
+	struct type_state_reg src_tsr;
+	Dwarf_Die type_die;
+	u32 insn_offset = dl->al.offset;
+	int reg_offset;
+	int sreg = src->reg1;
+	int dreg = dst->reg1;
+
+	if (!has_reg_type(state, dreg))
+		goto out_adjust;
+
+	tsr = &state->regs[dreg];
+	tsr->copied_from = -1;
+
+retry:
+	if (!has_reg_type(state, sreg) || !state->regs[sreg].ok) {
+		invalidate_reg_state(tsr);
+		return;
+	}
+
+	src_tsr = state->regs[sreg];
+	reg_offset = get_reg_index_offset(src);
+
+	/* Dereference the pointer if it has one */
+	if (src_tsr.kind == TSR_KIND_TYPE &&
+	    die_deref_ptr_type(&src_tsr.type,
+			       src_tsr.offset + reg_offset, &type_die)) {
+		tsr->type = type_die;
+		tsr->kind = TSR_KIND_TYPE;
+		tsr->offset = 0;
+		tsr->ok = true;
+
+		if (src->multi_regs) {
+			pr_debug_dtp("ldr [%x] %#x(reg%d, reg%d) -> reg%d",
+				     insn_offset, reg_offset, src->reg1,
+				     src->reg2, dreg);
+		} else {
+			pr_debug_dtp("ldr [%x] %#x(reg%d) -> reg%d",
+				     insn_offset, reg_offset, sreg, dreg);
+		}
+		pr_debug_type_name(&tsr->type, tsr->kind);
+	}
+	/* Or try another register if any */
+	else if (src->multi_regs && src->reg1 != src->reg2 && sreg != src->reg2) {
+		sreg = src->reg2;
+		goto retry;
+	}
+
+out_adjust:
+	adjust_reg_index_state(state, src, "ldr", insn_offset);
+}
+
 static void update_insn_state_arm64(struct type_state *state,
 				    struct data_loc_info *dloc, Dwarf_Die *cu_die __maybe_unused,
 				    struct disasm_line *dl)
 {
 	struct annotated_insn_loc loc;
+	struct annotated_op_loc *src = &loc.ops[INSN_OP_SOURCE];
 	struct annotated_op_loc *dst = &loc.ops[INSN_OP_TARGET];
 	u32 insn_offset = dl->al.offset;
 
@@ -342,12 +429,19 @@ static void update_insn_state_arm64(struct type_state *state,
 	 * For unsupported instructions with a destination register, invalidate
 	 * the destination register itself to prevent incorrect type propagation.
 	 */
-	if (has_reg_type(state, dst->reg1)) {
+	if (has_reg_type(state, dst->reg1) &&
+	    strncmp(dl->ins.name, "ld", 2)) {
 		pr_debug_dtp("%s [%x] invalidate reg%d\n",
 			     dl->ins.name, insn_offset, dst->reg1);
 		invalidate_reg_state(&state->regs[dst->reg1]);
 		return;
 	}
+
+	/* Memory to register transfers */
+	if (!strncmp(dl->ins.name, "ld", 2)) {
+		update_load_insn_state(state, dl, src, dst);
+		return;
+	}
 }
 #endif
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 15/21] perf annotate-arm64: Support store instruction tracking
  2026-07-01  3:53 [PATCH v3 00/21] perf arm64: Support data type profiling Tengda Wu
                   ` (13 preceding siblings ...)
  2026-07-01  3:53 ` [PATCH v3 14/21] perf annotate-arm64: Support load instruction tracking Tengda Wu
@ 2026-07-01  3:53 ` Tengda Wu
  2026-07-01  3:53 ` [PATCH v3 16/21] perf annotate-arm64: Support stack variable tracking Tengda Wu
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 52+ messages in thread
From: Tengda Wu @ 2026-07-01  3:53 UTC (permalink / raw)
  To: Namhyung Kim, james.clark, xueshuai, Li Huafei
  Cc: Peter Zijlstra, leo.yan, Ian Rogers, Kim Phillips, Mark Rutland,
	Arnaldo Carvalho de Melo, Ingo Molnar, Bill Wendling,
	Nick Desaulniers, Alexander Shishkin, Adrian Hunter, Zecheng Li,
	linux-perf-users, linux-kernel, llvm, Tengda Wu

Extend update_insn_state() for arm64 to handle store (STR) instructions.

Unlike load instructions, store operations do not change the data type
of the registers involved. However, arm64 store instructions sometimes
use pre-index or post-index addressing modes (e.g., str x1, [x0, #8]!),
which modify the base register as a side effect of the memory access.

Call adjust_reg_index_state() for store instructions to ensure the
base register's offset is correctly updated in the type state. This
maintains synchronization between the hardware register state and the
instruction tracker's model during sequential memory operations.

Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
---
 tools/perf/util/annotate-arch/annotate-arm64.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
index b48545f35620..f283596bb058 100644
--- a/tools/perf/util/annotate-arch/annotate-arm64.c
+++ b/tools/perf/util/annotate-arch/annotate-arm64.c
@@ -430,7 +430,7 @@ static void update_insn_state_arm64(struct type_state *state,
 	 * the destination register itself to prevent incorrect type propagation.
 	 */
 	if (has_reg_type(state, dst->reg1) &&
-	    strncmp(dl->ins.name, "ld", 2)) {
+	    strncmp(dl->ins.name, "ld", 2) && strncmp(dl->ins.name, "st", 2)) {
 		pr_debug_dtp("%s [%x] invalidate reg%d\n",
 			     dl->ins.name, insn_offset, dst->reg1);
 		invalidate_reg_state(&state->regs[dst->reg1]);
@@ -442,6 +442,16 @@ static void update_insn_state_arm64(struct type_state *state,
 		update_load_insn_state(state, dl, src, dst);
 		return;
 	}
+
+	/* Register to memory transfers */
+	if (!strncmp(dl->ins.name, "st", 2)) {
+		/*
+		 * Store instructions do not change the register type,
+		 * but the base register must be updated for pre/post-index
+		 * modes.
+		 */
+		adjust_reg_index_state(state, dst, "str", insn_offset);
+	}
 }
 #endif
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 16/21] perf annotate-arm64: Support stack variable tracking
  2026-07-01  3:53 [PATCH v3 00/21] perf arm64: Support data type profiling Tengda Wu
                   ` (14 preceding siblings ...)
  2026-07-01  3:53 ` [PATCH v3 15/21] perf annotate-arm64: Support store " Tengda Wu
@ 2026-07-01  3:53 ` Tengda Wu
  2026-07-01  4:16   ` sashiko-bot
  2026-07-01  3:53 ` [PATCH v3 17/21] perf annotate-arm64: Support 'mov' instruction tracking Tengda Wu
                   ` (4 subsequent siblings)
  20 siblings, 1 reply; 52+ messages in thread
From: Tengda Wu @ 2026-07-01  3:53 UTC (permalink / raw)
  To: Namhyung Kim, james.clark, xueshuai, Li Huafei
  Cc: Peter Zijlstra, leo.yan, Ian Rogers, Kim Phillips, Mark Rutland,
	Arnaldo Carvalho de Melo, Ingo Molnar, Bill Wendling,
	Nick Desaulniers, Alexander Shishkin, Adrian Hunter, Zecheng Li,
	linux-perf-users, linux-kernel, llvm, Tengda Wu

Extend update_insn_state() for arm64 to track data types stored on the
stack. This allows 'perf annotate' to maintain type information for
local variables that are spilled to or loaded from stack slots.

The implementation handles:

1. Stack Loads (LDR): Identify when a register is loaded from a stack
   slot and update the register's type state based on the tracked
   stack content or compound member types.

2. Stack Stores (STR): Update or create new stack state entries when
   a tracked register type is stored to the stack.

With these changes, the instruction tracker can now follow data types
as they move between registers and memory, specifically for function
local variables and compiler-spilled values on arm64.

Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
---
 .../perf/util/annotate-arch/annotate-arm64.c  | 124 +++++++++++++++++-
 1 file changed, 117 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
index f283596bb058..255ccaf33130 100644
--- a/tools/perf/util/annotate-arch/annotate-arm64.c
+++ b/tools/perf/util/annotate-arch/annotate-arm64.c
@@ -11,6 +11,8 @@
 #include "../disasm.h"
 #include "../annotate-data.h"
 #include "../debug.h"
+#include "../map.h"
+#include "../symbol.h"
 
 struct arch_arm64 {
 	struct arch arch;
@@ -357,6 +359,7 @@ static void adjust_reg_index_state(struct type_state *state,
 }
 
 static void update_load_insn_state(struct type_state *state,
+				   struct data_loc_info *dloc,
 				   struct disasm_line *dl,
 				   struct annotated_op_loc *src,
 				   struct annotated_op_loc *dst)
@@ -368,6 +371,8 @@ static void update_load_insn_state(struct type_state *state,
 	int reg_offset;
 	int sreg = src->reg1;
 	int dreg = dst->reg1;
+	int fbreg = dloc->fbreg;
+	int fboff = 0;
 
 	if (!has_reg_type(state, dreg))
 		goto out_adjust;
@@ -375,7 +380,52 @@ static void update_load_insn_state(struct type_state *state,
 	tsr = &state->regs[dreg];
 	tsr->copied_from = -1;
 
+	if (dloc->fb_cfa) {
+		u64 ip = dloc->ms->sym->start + dl->al.offset;
+		u64 pc = map__rip_2objdump(dloc->ms->map, ip);
+
+		if (die_get_cfa(dloc->di->dbg, pc, &fbreg, &fboff) < 0)
+			fbreg = -1;
+	}
+
 retry:
+	/* Check stack variables with offset */
+	if (sreg == fbreg || sreg == state->stack_reg) {
+		struct type_state_stack *stack;
+		int offset = src->offset - fboff;
+
+		stack = find_stack_state(state, offset);
+		if (stack == NULL) {
+			invalidate_reg_state(tsr);
+			goto out_adjust;
+		} else if (!stack->compound) {
+			tsr->type = stack->type;
+			tsr->kind = stack->kind;
+			tsr->offset = stack->ptr_offset;
+			tsr->ok = true;
+		} else if (die_get_member_type(&stack->type,
+					       offset - stack->offset,
+					       &type_die)) {
+			tsr->type = type_die;
+			tsr->kind = TSR_KIND_TYPE;
+			tsr->offset = 0;
+			tsr->ok = true;
+		} else {
+			invalidate_reg_state(tsr);
+			goto out_adjust;
+		}
+
+		if (sreg == fbreg) {
+			pr_debug_dtp("ldr [%x] -%#x(stack) -> reg%d",
+				     insn_offset, -offset, dreg);
+		} else {
+			pr_debug_dtp("ldr [%x] %#x(reg%d) -> reg%d",
+				     insn_offset, offset, sreg, dreg);
+		}
+		pr_debug_type_name(&tsr->type, tsr->kind);
+		goto out_adjust;
+	}
+
 	if (!has_reg_type(state, sreg) || !state->regs[sreg].ok) {
 		invalidate_reg_state(tsr);
 		return;
@@ -413,6 +463,70 @@ static void update_load_insn_state(struct type_state *state,
 	adjust_reg_index_state(state, src, "ldr", insn_offset);
 }
 
+static void update_store_insn_state(struct type_state *state,
+				    struct data_loc_info *dloc,
+				    struct disasm_line *dl,
+				    struct annotated_op_loc *src,
+				    struct annotated_op_loc *dst)
+{
+	struct type_state_reg *tsr;
+	u32 insn_offset = dl->al.offset;
+	int sreg = src->reg1;
+	int dreg = dst->reg1;
+	int fbreg = dloc->fbreg;
+	int fboff = 0;
+
+	if (!has_reg_type(state, sreg) || !state->regs[sreg].ok)
+		goto out_adjust;
+
+	if (dloc->fb_cfa) {
+		u64 ip = dloc->ms->sym->start + dl->al.offset;
+		u64 pc = map__rip_2objdump(dloc->ms->map, ip);
+
+		if (die_get_cfa(dloc->di->dbg, pc, &fbreg, &fboff) < 0)
+			fbreg = -1;
+	}
+
+	/* Check stack variables with offset */
+	if (dreg == fbreg || dreg == state->stack_reg) {
+		struct type_state_stack *stack;
+		int offset = dst->offset - fboff;
+
+		tsr = &state->regs[sreg];
+
+		stack = find_stack_state(state, offset);
+		if (stack) {
+			if (!stack->compound)
+				set_stack_state(stack, offset, tsr->kind,
+						&tsr->type, tsr->offset);
+		} else {
+			findnew_stack_state(state, offset, tsr->kind,
+					    &tsr->type, tsr->offset);
+		}
+
+		if (dst->reg1 == fbreg) {
+			pr_debug_dtp("str [%x] reg%d -> -%#x(stack)",
+				     insn_offset, sreg, -offset);
+		} else {
+			pr_debug_dtp("str [%x] reg%d -> %#x(reg%d)",
+				     insn_offset, sreg, offset, dreg);
+		}
+		if (tsr->offset != 0) {
+			pr_debug_dtp(" reg%d offset %#x ->",
+				     sreg, tsr->offset);
+		}
+		pr_debug_type_name(&tsr->type, tsr->kind);
+	}
+
+out_adjust:
+	/*
+	 * Store instructions do not change the register type,
+	 * but the base register must be updated for pre/post-index
+	 * modes.
+	 */
+	adjust_reg_index_state(state, dst, "str", insn_offset);
+}
+
 static void update_insn_state_arm64(struct type_state *state,
 				    struct data_loc_info *dloc, Dwarf_Die *cu_die __maybe_unused,
 				    struct disasm_line *dl)
@@ -439,18 +553,14 @@ static void update_insn_state_arm64(struct type_state *state,
 
 	/* Memory to register transfers */
 	if (!strncmp(dl->ins.name, "ld", 2)) {
-		update_load_insn_state(state, dl, src, dst);
+		update_load_insn_state(state, dloc, dl, src, dst);
 		return;
 	}
 
 	/* Register to memory transfers */
 	if (!strncmp(dl->ins.name, "st", 2)) {
-		/*
-		 * Store instructions do not change the register type,
-		 * but the base register must be updated for pre/post-index
-		 * modes.
-		 */
-		adjust_reg_index_state(state, dst, "str", insn_offset);
+		update_store_insn_state(state, dloc, dl, src, dst);
+		return;
 	}
 }
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 17/21] perf annotate-arm64: Support 'mov' instruction tracking
  2026-07-01  3:53 [PATCH v3 00/21] perf arm64: Support data type profiling Tengda Wu
                   ` (15 preceding siblings ...)
  2026-07-01  3:53 ` [PATCH v3 16/21] perf annotate-arm64: Support stack variable tracking Tengda Wu
@ 2026-07-01  3:53 ` Tengda Wu
  2026-07-01  4:21   ` sashiko-bot
  2026-07-01  3:53 ` [PATCH v3 18/21] perf annotate-arm64: Support 'add' " Tengda Wu
                   ` (3 subsequent siblings)
  20 siblings, 1 reply; 52+ messages in thread
From: Tengda Wu @ 2026-07-01  3:53 UTC (permalink / raw)
  To: Namhyung Kim, james.clark, xueshuai, Li Huafei
  Cc: Peter Zijlstra, leo.yan, Ian Rogers, Kim Phillips, Mark Rutland,
	Arnaldo Carvalho de Melo, Ingo Molnar, Bill Wendling,
	Nick Desaulniers, Alexander Shishkin, Adrian Hunter, Zecheng Li,
	linux-perf-users, linux-kernel, llvm, Tengda Wu

Extend update_insn_state() for arm64 to support register-to-register and
immediate-to-register 'mov' instructions.

For register-to-register 'mov' (e.g., mov dreg, sreg), propagate data type
information from the source register to the destination register.

For immediate-to-register 'mov' (e.g., mov dreg, #imm), store the immediate
value in the destination register's imm_value field and set its kind to
TSR_KIND_CONST, allowing subsequent instructions to resolve it as a
constant.

A real-world example is shown below:

  ffff8000803eebf8 <get_vma_policy>:
  ffff8000803eec20:  mov  x21, x0          // x0 (struct vm_area_struct*) -> x21
  ffff8000803eec28:  ldr  x2, [x0, #112]
  ffff8000803eec2c:  cbz  x2, ffff8000803eec94 <get_vma_policy+0x9c>
* ffff8000803eec94:  ldr  x0, [x21, #152]

Before this commit, the type of x21 was unknown, causing the subsequent
inference to fail:

  var [0] reg0 offset 0 type='struct vm_area_struct*' size=0x8
  chk [9c] reg21 offset=0x98 ok=0 kind=0 cfa : no type information
  final result: no type information

After this commit, the type of x21 is correctly inferred as 'vm_area_struct':

  var [0] reg0 offset 0 type='struct vm_area_struct*' size=0x8
  mov [28] reg0 -> reg21 type='struct vm_area_struct*' size=0x8
  chk [9c] reg21 offset=0x98 ok=1 kind=1 (struct vm_area_struct*) : Good!
  found by insn track: 0x98(reg21) type-offset=0x98
  final result:  type='struct vm_area_struct' size=0xb0

Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
---
 .../perf/util/annotate-arch/annotate-arm64.c  | 52 +++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
index 255ccaf33130..be07a85057c8 100644
--- a/tools/perf/util/annotate-arch/annotate-arm64.c
+++ b/tools/perf/util/annotate-arch/annotate-arm64.c
@@ -527,6 +527,51 @@ static void update_store_insn_state(struct type_state *state,
 	adjust_reg_index_state(state, dst, "str", insn_offset);
 }
 
+static void update_mov_insn_state(struct type_state *state,
+				  struct disasm_line *dl,
+				  struct annotated_op_loc *src,
+				  struct annotated_op_loc *dst)
+{
+	struct type_state_reg *tsr;
+	u32 insn_offset = dl->al.offset;
+	int sreg = src->reg1;
+	int dreg = dst->reg1;
+
+	if (!has_reg_type(state, dreg))
+		return;
+
+	tsr = &state->regs[dreg];
+	tsr->copied_from = -1;
+
+	if (src->imm) {
+		tsr->kind = TSR_KIND_CONST;
+		tsr->imm_value = src->offset;
+		tsr->offset = 0;
+		tsr->ok = true;
+
+		pr_debug_dtp("mov [%x] imm=%#x -> reg%d\n",
+			     insn_offset, tsr->imm_value, dreg);
+		return;
+	}
+
+	if (!has_reg_type(state, sreg) || !state->regs[sreg].ok) {
+		invalidate_reg_state(tsr);
+		return;
+	}
+
+	tsr->type = state->regs[sreg].type;
+	tsr->kind = state->regs[sreg].kind;
+	tsr->imm_value = state->regs[sreg].imm_value;
+	tsr->offset = state->regs[sreg].offset;
+	tsr->ok = state->regs[sreg].ok;
+
+	if (tsr->kind == TSR_KIND_TYPE || tsr->kind == TSR_KIND_POINTER)
+		tsr->copied_from = sreg;
+
+	pr_debug_dtp("mov [%x] reg%d -> reg%d", insn_offset, sreg, dreg);
+	pr_debug_type_name(&tsr->type, tsr->kind);
+}
+
 static void update_insn_state_arm64(struct type_state *state,
 				    struct data_loc_info *dloc, Dwarf_Die *cu_die __maybe_unused,
 				    struct disasm_line *dl)
@@ -544,6 +589,7 @@ static void update_insn_state_arm64(struct type_state *state,
 	 * the destination register itself to prevent incorrect type propagation.
 	 */
 	if (has_reg_type(state, dst->reg1) &&
+	    strcmp(dl->ins.name, "mov") &&
 	    strncmp(dl->ins.name, "ld", 2) && strncmp(dl->ins.name, "st", 2)) {
 		pr_debug_dtp("%s [%x] invalidate reg%d\n",
 			     dl->ins.name, insn_offset, dst->reg1);
@@ -551,6 +597,12 @@ static void update_insn_state_arm64(struct type_state *state,
 		return;
 	}
 
+	/* Register to register or imm value to register transfers */
+	if (!strcmp(dl->ins.name, "mov")) {
+		update_mov_insn_state(state, dl, src, dst);
+		return;
+	}
+
 	/* Memory to register transfers */
 	if (!strncmp(dl->ins.name, "ld", 2)) {
 		update_load_insn_state(state, dloc, dl, src, dst);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 18/21] perf annotate-arm64: Support 'add' instruction tracking
  2026-07-01  3:53 [PATCH v3 00/21] perf arm64: Support data type profiling Tengda Wu
                   ` (16 preceding siblings ...)
  2026-07-01  3:53 ` [PATCH v3 17/21] perf annotate-arm64: Support 'mov' instruction tracking Tengda Wu
@ 2026-07-01  3:53 ` Tengda Wu
  2026-07-01  4:16   ` sashiko-bot
  2026-07-01  3:53 ` [PATCH v3 19/21] perf annotate-arm64: Support 'adrp' instruction to track global variables Tengda Wu
                   ` (2 subsequent siblings)
  20 siblings, 1 reply; 52+ messages in thread
From: Tengda Wu @ 2026-07-01  3:53 UTC (permalink / raw)
  To: Namhyung Kim, james.clark, xueshuai, Li Huafei
  Cc: Peter Zijlstra, leo.yan, Ian Rogers, Kim Phillips, Mark Rutland,
	Arnaldo Carvalho de Melo, Ingo Molnar, Bill Wendling,
	Nick Desaulniers, Alexander Shishkin, Adrian Hunter, Zecheng Li,
	linux-perf-users, linux-kernel, llvm, Tengda Wu

Extend update_insn_state() for arm64 to track 'add' instructions for
structure member address calculation, which commonly appear as:

  add  dreg, base, #offset
  add  dreg, base, reg2     (reg2 holds a constant)

For such instructions, determine the final offset value from either the
immediate or the constant in reg2. If the base register contains a
pointer or structure type, compute the resulting offset and invoke
die_get_member_type() to verify it corresponds to a valid member within
that type. On success, update the destination register's type state with
the new offset while retaining the base type information.

A real-world example is shown below:

  ffff80008001c9a8 <flush_ptrace_hw_breakpoint>:
  ffff80008001c9c4:  add  x19, x0, #0xeb8   // x0 (task_struct*) + 0xeb8 -> x19
* ffff80008001c9d0:  ldr  x0, [x19]

Before this commit, the type flow broke at the 'add' instruction,
leaving the subsequent load with no type information:

  chk [28] reg19 offset=0 ok=0 kind=0 cfa : no type information
  final result: no type information

After this commit, the tracker correctly follows the member address
calculation:

  var [0] reg0 offset 0 type='struct task_struct*'
  add [1c] address of 0xeb8(reg0) -> reg19 type='struct task_struct*'
  chk [28] reg19 offset=0 ok=1 kind=1 (struct task_struct*) : Good!
  found by insn track: 0(reg19) type-offset=0xeb8
  final result: type='struct task_struct'

Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
---
 .../perf/util/annotate-arch/annotate-arm64.c  | 91 ++++++++++++++++++-
 1 file changed, 90 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
index be07a85057c8..1fed18811719 100644
--- a/tools/perf/util/annotate-arch/annotate-arm64.c
+++ b/tools/perf/util/annotate-arch/annotate-arm64.c
@@ -572,6 +572,90 @@ static void update_mov_insn_state(struct type_state *state,
 	pr_debug_type_name(&tsr->type, tsr->kind);
 }
 
+static void update_add_insn_state(struct type_state *state,
+				  struct disasm_line *dl,
+				  struct annotated_op_loc *src,
+				  struct annotated_op_loc *dst)
+{
+	struct type_state_reg *tsr;
+	struct type_state_reg src_tsr;
+	Dwarf_Die type_die;
+	u32 insn_offset = dl->al.offset;
+	int sreg = src->reg1;
+	int dreg = dst->reg1;
+	int reg_offset;
+
+	if (!has_reg_type(state, dreg))
+		return;
+
+	tsr = &state->regs[dreg];
+	tsr->copied_from = -1;
+
+retry:
+	if (!has_reg_type(state, sreg) || !state->regs[sreg].ok) {
+		invalidate_reg_state(tsr);
+		return;
+	}
+
+	src_tsr = state->regs[sreg];
+
+	/*
+	 * Handle 'add' instructions of the form:
+	 *   add  dreg, base, #offset     (immediate offset)
+	 *   add  dreg, base, reg2        (reg2 holds a constant)
+	 *
+	 * For case 2, retrieve the constant value from reg2
+	 * and use it as the offset.
+	 */
+	reg_offset = src->offset;
+	if (src->multi_regs) {
+		int reg2 = (sreg == src->reg1) ? src->reg2 : src->reg1;
+
+		if (has_reg_type(state, reg2) && state->regs[reg2].ok &&
+		    state->regs[reg2].kind == TSR_KIND_CONST)
+			reg_offset = state->regs[reg2].imm_value;
+	}
+
+	/* Handle calculation of a register holding a typed pointer */
+	if (src_tsr.kind == TSR_KIND_POINTER ||
+	    (src_tsr.kind == TSR_KIND_TYPE &&
+	     dwarf_tag(&src_tsr.type) == DW_TAG_pointer_type)) {
+		s32 offset;
+
+		if (src_tsr.kind == TSR_KIND_TYPE &&
+		    __die_get_real_type(&src_tsr.type, &type_die) == NULL) {
+			invalidate_reg_state(tsr);
+			return;
+		}
+
+		if (src_tsr.kind == TSR_KIND_POINTER)
+			type_die = src_tsr.type;
+
+		/* Check if the target type has a member at the new offset */
+		offset = reg_offset + src_tsr.offset;
+		if (die_get_member_type(&type_die, offset, &type_die) == NULL) {
+			invalidate_reg_state(tsr);
+			return;
+		}
+
+		tsr->type = src_tsr.type;
+		tsr->kind = src_tsr.kind;
+		tsr->offset = offset;
+		tsr->ok = src_tsr.ok;
+
+		pr_debug_dtp("add [%x] address of %s%#x(reg%d) -> reg%d",
+			     insn_offset, reg_offset < 0 ? "-" : "",
+			     abs(reg_offset), sreg, dreg);
+
+		pr_debug_type_name(&tsr->type, tsr->kind);
+	}
+	/* Or try another register if any */
+	else if (src->multi_regs && src->reg1 != src->reg2 && sreg != src->reg2) {
+		sreg = src->reg2;
+		goto retry;
+	}
+}
+
 static void update_insn_state_arm64(struct type_state *state,
 				    struct data_loc_info *dloc, Dwarf_Die *cu_die __maybe_unused,
 				    struct disasm_line *dl)
@@ -589,7 +673,7 @@ static void update_insn_state_arm64(struct type_state *state,
 	 * the destination register itself to prevent incorrect type propagation.
 	 */
 	if (has_reg_type(state, dst->reg1) &&
-	    strcmp(dl->ins.name, "mov") &&
+	    strcmp(dl->ins.name, "add") && strcmp(dl->ins.name, "mov") &&
 	    strncmp(dl->ins.name, "ld", 2) && strncmp(dl->ins.name, "st", 2)) {
 		pr_debug_dtp("%s [%x] invalidate reg%d\n",
 			     dl->ins.name, insn_offset, dst->reg1);
@@ -597,6 +681,11 @@ static void update_insn_state_arm64(struct type_state *state,
 		return;
 	}
 
+	if (!strcmp(dl->ins.name, "add")) {
+		update_add_insn_state(state, dl, src, dst);
+		return;
+	}
+
 	/* Register to register or imm value to register transfers */
 	if (!strcmp(dl->ins.name, "mov")) {
 		update_mov_insn_state(state, dl, src, dst);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 19/21] perf annotate-arm64: Support 'adrp' instruction to track global variables
  2026-07-01  3:53 [PATCH v3 00/21] perf arm64: Support data type profiling Tengda Wu
                   ` (17 preceding siblings ...)
  2026-07-01  3:53 ` [PATCH v3 18/21] perf annotate-arm64: Support 'add' " Tengda Wu
@ 2026-07-01  3:53 ` Tengda Wu
  2026-07-01  4:15   ` sashiko-bot
  2026-07-01  3:53 ` [PATCH v3 20/21] perf annotate-arm64: Support per-cpu variable access tracking Tengda Wu
  2026-07-01  3:53 ` [PATCH v3 21/21] perf annotate-arm64: Support 'mrs' instruction to track 'current' pointer Tengda Wu
  20 siblings, 1 reply; 52+ messages in thread
From: Tengda Wu @ 2026-07-01  3:53 UTC (permalink / raw)
  To: Namhyung Kim, james.clark, xueshuai, Li Huafei
  Cc: Peter Zijlstra, leo.yan, Ian Rogers, Kim Phillips, Mark Rutland,
	Arnaldo Carvalho de Melo, Ingo Molnar, Bill Wendling,
	Nick Desaulniers, Alexander Shishkin, Adrian Hunter, Zecheng Li,
	linux-perf-users, linux-kernel, llvm, Tengda Wu

Extend update_insn_state() for arm64 to track global variable types
calculated via page-relative addressing.

On arm64, global variables are typically accessed by first calculating
the page address using 'adrp', followed by an 'add' or 'ldr' to get the
specific symbol address. Without tracking 'adrp', the instruction
tracker loses the base address, making it impossible to resolve
global symbols and their associated DWARF types.

Introduce TSR_KIND_GLOBAL_ADDR to represent a partial global address
state. When encountering 'adrp', store the page-aligned target address
in the register's type state. Upon a subsequent 'add' or 'ldr'
instruction that references a TSR_KIND_GLOBAL_ADDR register, combine
the page address with the immediate offset.

A real-world example is shown below:

  ffff80008032e008 <folios_put_refs>:
  ffff80008032e048:  adrp  x24, ffff80008202f000 <nr_cpu_ids>
  ffff80008032e050:  add   x24, x24, #0xd40
* ffff80008032e078:  ldr   x0, [x24]

Before this commit, x24 was unknown, leading to no type information:

  chk [70] reg24 offset=0 ok=0 kind=0 cfa : no type information
  final result: no type information

After this commit, the tracker correctly follows the adrp/add flow:

  adrp [40] global addr=0xffff80008202f000 -> reg24
  add [48] global 0xd40(reg24) -> reg24
  chk [70] reg24 offset=0 ok=1 kind=7 global addr : Good!
  final result: type='struct folio*'

Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
---
 .../perf/util/annotate-arch/annotate-arm64.c  | 89 +++++++++++++++++--
 tools/perf/util/annotate-arch/annotate-x86.c  |  6 +-
 tools/perf/util/annotate-data.c               | 33 +++++--
 tools/perf/util/annotate-data.h               |  7 +-
 4 files changed, 121 insertions(+), 14 deletions(-)

diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
index 1fed18811719..6f96e75d313d 100644
--- a/tools/perf/util/annotate-arch/annotate-arm64.c
+++ b/tools/perf/util/annotate-arch/annotate-arm64.c
@@ -7,6 +7,7 @@
 #include <linux/string.h>
 #include <linux/ctype.h>
 #include <regex.h>
+#include <inttypes.h>
 #include "../annotate.h"
 #include "../disasm.h"
 #include "../annotate-data.h"
@@ -359,7 +360,7 @@ static void adjust_reg_index_state(struct type_state *state,
 }
 
 static void update_load_insn_state(struct type_state *state,
-				   struct data_loc_info *dloc,
+				   struct data_loc_info *dloc, Dwarf_Die *cu_die,
 				   struct disasm_line *dl,
 				   struct annotated_op_loc *src,
 				   struct annotated_op_loc *dst)
@@ -402,6 +403,7 @@ static void update_load_insn_state(struct type_state *state,
 			tsr->type = stack->type;
 			tsr->kind = stack->kind;
 			tsr->offset = stack->ptr_offset;
+			tsr->addr = stack->addr;
 			tsr->ok = true;
 		} else if (die_get_member_type(&stack->type,
 					       offset - stack->offset,
@@ -409,6 +411,7 @@ static void update_load_insn_state(struct type_state *state,
 			tsr->type = type_die;
 			tsr->kind = TSR_KIND_TYPE;
 			tsr->offset = 0;
+			tsr->addr = 0;
 			tsr->ok = true;
 		} else {
 			invalidate_reg_state(tsr);
@@ -441,6 +444,7 @@ static void update_load_insn_state(struct type_state *state,
 		tsr->type = type_die;
 		tsr->kind = TSR_KIND_TYPE;
 		tsr->offset = 0;
+		tsr->addr = 0;
 		tsr->ok = true;
 
 		if (src->multi_regs) {
@@ -453,6 +457,33 @@ static void update_load_insn_state(struct type_state *state,
 		}
 		pr_debug_type_name(&tsr->type, tsr->kind);
 	}
+	/* Or check if it's a global variable */
+	else if (src_tsr.kind == TSR_KIND_GLOBAL_ADDR) {
+		u64 ip = dloc->ms->sym->start + dl->al.offset;
+		u64 addr = src_tsr.addr + reg_offset;
+		int offset;
+
+		if (!get_global_var_type(cu_die, dloc, ip, addr, &offset, &type_die) ||
+		    !die_get_member_type(&type_die, offset, &type_die)) {
+			invalidate_reg_state(tsr);
+			goto out_adjust;
+		}
+
+		tsr->type = type_die;
+		tsr->kind = TSR_KIND_TYPE;
+		tsr->offset = 0;
+		tsr->addr = 0;
+		tsr->ok = true;
+
+		if (src->multi_regs) {
+			pr_debug_dtp("ldr [%x] global (reg%d, reg%d) -> reg%d",
+				     insn_offset, src->reg1, src->reg2, dreg);
+		} else {
+			pr_debug_dtp("ldr [%x] global (reg%d) -> reg%d",
+				     insn_offset, sreg, dreg);
+		}
+		pr_debug_type_name(&tsr->type, tsr->kind);
+	}
 	/* Or try another register if any */
 	else if (src->multi_regs && src->reg1 != src->reg2 && sreg != src->reg2) {
 		sreg = src->reg2;
@@ -498,10 +529,10 @@ static void update_store_insn_state(struct type_state *state,
 		if (stack) {
 			if (!stack->compound)
 				set_stack_state(stack, offset, tsr->kind,
-						&tsr->type, tsr->offset);
+						&tsr->type, tsr->offset, tsr->addr);
 		} else {
 			findnew_stack_state(state, offset, tsr->kind,
-					    &tsr->type, tsr->offset);
+					    &tsr->type, tsr->offset, tsr->addr);
 		}
 
 		if (dst->reg1 == fbreg) {
@@ -547,6 +578,7 @@ static void update_mov_insn_state(struct type_state *state,
 		tsr->kind = TSR_KIND_CONST;
 		tsr->imm_value = src->offset;
 		tsr->offset = 0;
+		tsr->addr = 0;
 		tsr->ok = true;
 
 		pr_debug_dtp("mov [%x] imm=%#x -> reg%d\n",
@@ -563,6 +595,7 @@ static void update_mov_insn_state(struct type_state *state,
 	tsr->kind = state->regs[sreg].kind;
 	tsr->imm_value = state->regs[sreg].imm_value;
 	tsr->offset = state->regs[sreg].offset;
+	tsr->addr = state->regs[sreg].addr;
 	tsr->ok = state->regs[sreg].ok;
 
 	if (tsr->kind == TSR_KIND_TYPE || tsr->kind == TSR_KIND_POINTER)
@@ -641,6 +674,7 @@ static void update_add_insn_state(struct type_state *state,
 		tsr->type = src_tsr.type;
 		tsr->kind = src_tsr.kind;
 		tsr->offset = offset;
+		tsr->addr = 0;
 		tsr->ok = src_tsr.ok;
 
 		pr_debug_dtp("add [%x] address of %s%#x(reg%d) -> reg%d",
@@ -649,6 +683,16 @@ static void update_add_insn_state(struct type_state *state,
 
 		pr_debug_type_name(&tsr->type, tsr->kind);
 	}
+	/* Handle page-relative global address calculation */
+	else if (src_tsr.kind == TSR_KIND_GLOBAL_ADDR) {
+		tsr->kind = src_tsr.kind;
+		tsr->addr = src_tsr.addr + reg_offset;
+		tsr->offset = 0;
+		tsr->ok = true;
+
+		pr_debug_dtp("add [%x] global %#x(reg%d) -> reg%d\n",
+			     insn_offset, reg_offset, sreg, dreg);
+	}
 	/* Or try another register if any */
 	else if (src->multi_regs && src->reg1 != src->reg2 && sreg != src->reg2) {
 		sreg = src->reg2;
@@ -656,8 +700,37 @@ static void update_add_insn_state(struct type_state *state,
 	}
 }
 
+static void update_adrp_insn_state(struct type_state *state,
+				   struct disasm_line *dl,
+				   struct annotated_op_loc *dst)
+{
+	struct type_state_reg *tsr;
+	u32 insn_offset = dl->al.offset;
+	int dreg = dst->reg1;
+
+	if (!has_reg_type(state, dreg))
+		return;
+
+	tsr = &state->regs[dreg];
+	tsr->copied_from = -1;
+
+	if (!dl->ops.source.addr) {
+		invalidate_reg_state(tsr);
+		return;
+	}
+
+	tsr->kind = TSR_KIND_GLOBAL_ADDR;
+	/* Partial page-relative address, finalized in next 'add/ldr' */
+	tsr->addr = dl->ops.source.addr;
+	tsr->offset = 0;
+	tsr->ok = true;
+
+	pr_debug_dtp("adrp [%x] global addr=%#"PRIx64" -> reg%d\n",
+		     insn_offset, tsr->addr, dreg);
+}
+
 static void update_insn_state_arm64(struct type_state *state,
-				    struct data_loc_info *dloc, Dwarf_Die *cu_die __maybe_unused,
+				    struct data_loc_info *dloc, Dwarf_Die *cu_die,
 				    struct disasm_line *dl)
 {
 	struct annotated_insn_loc loc;
@@ -673,6 +746,7 @@ static void update_insn_state_arm64(struct type_state *state,
 	 * the destination register itself to prevent incorrect type propagation.
 	 */
 	if (has_reg_type(state, dst->reg1) &&
+	    strcmp(dl->ins.name, "adrp") &&
 	    strcmp(dl->ins.name, "add") && strcmp(dl->ins.name, "mov") &&
 	    strncmp(dl->ins.name, "ld", 2) && strncmp(dl->ins.name, "st", 2)) {
 		pr_debug_dtp("%s [%x] invalidate reg%d\n",
@@ -681,6 +755,11 @@ static void update_insn_state_arm64(struct type_state *state,
 		return;
 	}
 
+	if (!strcmp(dl->ins.name, "adrp")) {
+		update_adrp_insn_state(state, dl, dst);
+		return;
+	}
+
 	if (!strcmp(dl->ins.name, "add")) {
 		update_add_insn_state(state, dl, src, dst);
 		return;
@@ -694,7 +773,7 @@ static void update_insn_state_arm64(struct type_state *state,
 
 	/* Memory to register transfers */
 	if (!strncmp(dl->ins.name, "ld", 2)) {
-		update_load_insn_state(state, dloc, dl, src, dst);
+		update_load_insn_state(state, dloc, cu_die, dl, src, dst);
 		return;
 	}
 
diff --git a/tools/perf/util/annotate-arch/annotate-x86.c b/tools/perf/util/annotate-arch/annotate-x86.c
index ef84b9593a39..9977ca079143 100644
--- a/tools/perf/util/annotate-arch/annotate-x86.c
+++ b/tools/perf/util/annotate-arch/annotate-x86.c
@@ -772,10 +772,12 @@ static void update_insn_state_x86(struct type_state *state,
 				 */
 				if (!stack->compound)
 					set_stack_state(stack, offset, tsr->kind,
-							&tsr->type, tsr->offset);
+							&tsr->type, tsr->offset,
+							tsr->addr);
 			} else {
 				findnew_stack_state(state, offset, tsr->kind,
-						    &tsr->type, tsr->offset);
+						    &tsr->type, tsr->offset,
+						    tsr->addr);
 			}
 
 			if (dst->reg1 == fbreg) {
diff --git a/tools/perf/util/annotate-data.c b/tools/perf/util/annotate-data.c
index c51b20a7af9b..6fa5cd373a46 100644
--- a/tools/perf/util/annotate-data.c
+++ b/tools/perf/util/annotate-data.c
@@ -70,6 +70,9 @@ void pr_debug_type_name(Dwarf_Die *die, enum type_state_kind kind)
 	case TSR_KIND_CANARY:
 		pr_info(" stack canary\n");
 		return;
+	case TSR_KIND_GLOBAL_ADDR:
+		pr_info(" global address\n");
+		return;
 	case TSR_KIND_TYPE:
 	default:
 		break;
@@ -590,7 +593,7 @@ struct type_state_stack *find_stack_state(struct type_state *state,
 }
 
 void set_stack_state(struct type_state_stack *stack, int offset, u8 kind,
-			    Dwarf_Die *type_die, int ptr_offset)
+			    Dwarf_Die *type_die, int ptr_offset, u64 addr)
 {
 	int tag;
 	Dwarf_Word size;
@@ -607,6 +610,7 @@ void set_stack_state(struct type_state_stack *stack, int offset, u8 kind,
 	stack->offset = offset;
 	stack->ptr_offset = ptr_offset;
 	stack->kind = kind;
+	stack->addr = addr;
 
 	if (kind == TSR_KIND_POINTER) {
 		stack->compound = false;
@@ -629,18 +633,18 @@ void set_stack_state(struct type_state_stack *stack, int offset, u8 kind,
 struct type_state_stack *findnew_stack_state(struct type_state *state,
 						    int offset, u8 kind,
 						    Dwarf_Die *type_die,
-						    int ptr_offset)
+						    int ptr_offset, u64 addr)
 {
 	struct type_state_stack *stack = find_stack_state(state, offset);
 
 	if (stack) {
-		set_stack_state(stack, offset, kind, type_die, ptr_offset);
+		set_stack_state(stack, offset, kind, type_die, ptr_offset, addr);
 		return stack;
 	}
 
 	stack = malloc(sizeof(*stack));
 	if (stack) {
-		set_stack_state(stack, offset, kind, type_die, ptr_offset);
+		set_stack_state(stack, offset, kind, type_die, ptr_offset, addr);
 		list_add(&stack->list, &state->stack_vars);
 	}
 	return stack;
@@ -935,7 +939,7 @@ static void update_var_state(struct type_state *state, struct data_loc_info *dlo
 				continue;
 
 			findnew_stack_state(state, offset, TSR_KIND_TYPE,
-					    &mem_die, /*ptr_offset=*/0);
+					    &mem_die, /*ptr_offset=*/0, /*addr=*/0);
 
 			if (var->reg == state->stack_reg) {
 				pr_debug_dtp("var [%"PRIx64"] %#x(reg%d)",
@@ -1278,6 +1282,25 @@ static enum type_match_result check_matching_type(struct type_state *state,
 		if (dloc->op->offset < 0 && reg != state->stack_reg && reg != dloc->fbreg)
 			goto check_kernel;
 	}
+
+	if (state->regs[reg].kind == TSR_KIND_GLOBAL_ADDR) {
+		u64 var_addr = state->regs[reg].addr + dloc->op->offset;
+		int var_offset;
+
+		pr_debug_dtp("global addr");
+
+		/*
+		 * The register holds the address of a global variable. Try to
+		 * find the variable by the address and get its type.
+		 */
+		if (get_global_var_type(cu_die, dloc, dloc->ip, var_addr,
+					&var_offset, type_die)) {
+			dloc->type_offset = var_offset;
+			return PERF_TMR_OK;
+		}
+		/* No need to retry global variables */
+		return PERF_TMR_BAIL_OUT;
+	}
 check_non_register:
 	if (reg == dloc->fbreg || reg == state->stack_reg) {
 		struct type_state_stack *stack;
diff --git a/tools/perf/util/annotate-data.h b/tools/perf/util/annotate-data.h
index 453e13bbe3e2..a3e1ef7097f0 100644
--- a/tools/perf/util/annotate-data.h
+++ b/tools/perf/util/annotate-data.h
@@ -37,6 +37,7 @@ enum type_state_kind {
 	TSR_KIND_PERCPU_POINTER,
 	TSR_KIND_POINTER,
 	TSR_KIND_CANARY,
+	TSR_KIND_GLOBAL_ADDR,
 };
 
 /**
@@ -187,6 +188,7 @@ struct type_state_reg {
 	u64 lifetime_end;
 	u8 kind;
 	u8 copied_from;
+	u64 addr;
 };
 
 /* Type information in a stack location, dynamically allocated */
@@ -199,6 +201,7 @@ struct type_state_stack {
 	int size;
 	bool compound;
 	u8 kind;
+	u64 addr;
 };
 
 /*
@@ -253,9 +256,9 @@ bool has_reg_type(struct type_state *state, int reg);
 struct type_state_stack *findnew_stack_state(struct type_state *state,
 						int offset, u8 kind,
 						Dwarf_Die *type_die,
-						int ptr_offset);
+						int ptr_offset, u64 addr);
 void set_stack_state(struct type_state_stack *stack, int offset, u8 kind,
-				Dwarf_Die *type_die, int ptr_offset);
+				Dwarf_Die *type_die, int ptr_offset, u64 addr);
 struct type_state_stack *find_stack_state(struct type_state *state,
 						int offset);
 void invalidate_reg_state(struct type_state_reg *reg);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 20/21] perf annotate-arm64: Support per-cpu variable access tracking
  2026-07-01  3:53 [PATCH v3 00/21] perf arm64: Support data type profiling Tengda Wu
                   ` (18 preceding siblings ...)
  2026-07-01  3:53 ` [PATCH v3 19/21] perf annotate-arm64: Support 'adrp' instruction to track global variables Tengda Wu
@ 2026-07-01  3:53 ` Tengda Wu
  2026-07-01  4:18   ` sashiko-bot
  2026-07-01  3:53 ` [PATCH v3 21/21] perf annotate-arm64: Support 'mrs' instruction to track 'current' pointer Tengda Wu
  20 siblings, 1 reply; 52+ messages in thread
From: Tengda Wu @ 2026-07-01  3:53 UTC (permalink / raw)
  To: Namhyung Kim, james.clark, xueshuai, Li Huafei
  Cc: Peter Zijlstra, leo.yan, Ian Rogers, Kim Phillips, Mark Rutland,
	Arnaldo Carvalho de Melo, Ingo Molnar, Bill Wendling,
	Nick Desaulniers, Alexander Shishkin, Adrian Hunter, Zecheng Li,
	linux-perf-users, linux-kernel, llvm, Tengda Wu

Extend update_insn_state() for arm64 to handle per-cpu variable
addressing.

On arm64, per-cpu variables are accessed by adding a per-cpu offset
(typically from the '__per_cpu_offset' array) to the address of a global
variable or a local variable with '__percpu' modifier(e.g.,
's32 __percpu *counters' in struct percpu_counter). This results in
instruction patterns like:

  ldr   x0, [x6, x5]   // Pattern A: direct load per-cpu instance
  add   x0, x6, x5     // Pattern B: compute per-cpu address

where x6 holds the per-cpu offset retrieved from:

  adrp  x4, <page>
  add   x4, x4, #offset          // x4 = &__per_cpu_offset
  ldr   x6, [x4, w0, sxtw #3]    // x6 = __per_cpu_offset[cpu]

and x5 is one of the following:

  case 1: global variable
    adrp  x5, <page>
    add   x5, x5, #offset        // x5 = &global_var

  case 2: local variable with '__percpu' modifier
    ldr   x1, [x25, #32]         // x1 = local_percpu_ptr

To handle such cases:

1. Identify per-cpu base initialization: Detect 'adrp+add' pairs that
   resolve to the '__per_cpu_offset' symbol and mark the destination
   register as TSR_KIND_PERCPU_BASE.
2. Propagate type information: During subsequent 'ldr' or 'add'
   instructions, if one register is TSR_KIND_PERCPU_BASE, attempt to
   resolve the type from the other register.

A real-world example is shown below:

  ffff8000808f2d28 <cppc_set_perf>:
  ffff8000808f2d38:  adrp  x2, ffff800082033000
  ffff8000808f2d3c:  add   x5, x2, #0x3f8         // x5 = &__per_cpu_offset
  ffff8000808f2d44:  adrp  x2, ffff800081f73000
  ffff8000808f2d48:  add   x2, x2, #0x6b8         // x2 = &cpu_pcc_subspace_idx
  ffff8000808f2d6c:  ldr   x5, [x5, w0, sxtw #3]  // x5 = __per_cpu_offset[cpu]
* ffff8000808f2d80:  ldr   w23, [x5, x2]          // per_cpu(cpu_pcc_subspace_idx, cpu)

Before this commit, the tracker could not link x5 back to a per-cpu
context, resulting in an incorrect data type resolution:

  adrp [10] global addr=0xffff800082033000 -> reg2
  add [14] global 0x3f8(reg2) -> reg5
  adrp [1c] global addr=0xffff800081f73000 -> reg2
  add [20] global 0x6b8(reg2) -> reg2
  ldr [44] global (reg5, reg0) -> reg5 type='long unsigned int[]' size=0x1000
  chk [58] reg5 offset=0 ok=1 kind=1 (long unsigned int[]) : Good!
  found by insn track: 0(reg5, reg2) type-offset=0
  final result:  type='long unsigned int' size=0x8

After this commit, the tracker correctly identifies the per-cpu flow and
resolves the actual variable type:

  ldr [44] global (reg5, reg0) -> reg5 percpu base
  chk [58] reg5 offset=0 ok=1 kind=2 percpu var : retry
  chk [58] reg2 offset=0 ok=1 kind=7 global addr : Good!
  found by insn track: 0(reg5, reg2) type-offset=0
  final result:  type='int' size=0x4

Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
---
 .../perf/util/annotate-arch/annotate-arm64.c  | 60 ++++++++++++++++++-
 tools/perf/util/annotate-data.c               | 24 +++++++-
 2 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
index 6f96e75d313d..ec6fd59d51a2 100644
--- a/tools/perf/util/annotate-arch/annotate-arm64.c
+++ b/tools/perf/util/annotate-arch/annotate-arm64.c
@@ -462,6 +462,15 @@ static void update_load_insn_state(struct type_state *state,
 		u64 ip = dloc->ms->sym->start + dl->al.offset;
 		u64 addr = src_tsr.addr + reg_offset;
 		int offset;
+		u8 kind;
+		const char *var_name = NULL;
+
+		/* it might be per-cpu offset */
+		if (get_global_var_info(dloc, addr, &var_name, &offset) &&
+		    !strcmp(var_name, "__per_cpu_offset"))
+			kind = TSR_KIND_PERCPU_BASE;
+		else
+			kind = TSR_KIND_TYPE;
 
 		if (!get_global_var_type(cu_die, dloc, ip, addr, &offset, &type_die) ||
 		    !die_get_member_type(&type_die, offset, &type_die)) {
@@ -470,7 +479,7 @@ static void update_load_insn_state(struct type_state *state,
 		}
 
 		tsr->type = type_die;
-		tsr->kind = TSR_KIND_TYPE;
+		tsr->kind = kind;
 		tsr->offset = 0;
 		tsr->addr = 0;
 		tsr->ok = true;
@@ -484,6 +493,28 @@ static void update_load_insn_state(struct type_state *state,
 		}
 		pr_debug_type_name(&tsr->type, tsr->kind);
 	}
+	/* Or check if it's a per-cpu access */
+	else if (src_tsr.kind == TSR_KIND_PERCPU_BASE) {
+		int reg2;
+
+		if (!src->multi_regs || src->reg1 == src->reg2 ||
+		    sreg == src->reg2 /* retried */) {
+			invalidate_reg_state(tsr);
+			goto out_adjust;
+		}
+
+		reg2 = src->reg2;
+		if (!has_reg_type(state, reg2) || !state->regs[reg2].ok ||
+		    (state->regs[reg2].kind != TSR_KIND_GLOBAL_ADDR &&
+		     state->regs[reg2].kind != TSR_KIND_TYPE)) {
+			invalidate_reg_state(tsr);
+			goto out_adjust;
+		} else {
+			/* Treat percpu as array: resolve type from reg2 */
+			sreg = src->reg2;
+			goto retry;
+		}
+	}
 	/* Or try another register if any */
 	else if (src->multi_regs && src->reg1 != src->reg2 && sreg != src->reg2) {
 		sreg = src->reg2;
@@ -693,6 +724,33 @@ static void update_add_insn_state(struct type_state *state,
 		pr_debug_dtp("add [%x] global %#x(reg%d) -> reg%d\n",
 			     insn_offset, reg_offset, sreg, dreg);
 	}
+	/* Handle per-cpu base address calculation for per-cpu variables */
+	else if (src_tsr.kind == TSR_KIND_PERCPU_BASE) {
+		/*
+		 * A per-cpu base acts like an array index. Adding it to
+		 * a global variable or typed register should preserve
+		 * the original variable's type. Inherit the type from
+		 * the typed register.
+		 */
+		if (!src->multi_regs || !has_reg_type(state, src->reg2) ||
+		    !state->regs[src->reg2].ok ||
+		    (state->regs[src->reg2].kind != TSR_KIND_GLOBAL_ADDR &&
+		     state->regs[src->reg2].kind != TSR_KIND_TYPE)) {
+			invalidate_reg_state(tsr);
+			return;
+		}
+
+		tsr->type = state->regs[src->reg2].type;
+		tsr->kind = state->regs[src->reg2].kind;
+		tsr->offset = state->regs[src->reg2].offset;
+		tsr->addr = state->regs[src->reg2].addr;
+		tsr->ok = state->regs[src->reg2].ok;
+
+		pr_debug_dtp("add [%x] percpu (reg%d) -> reg%d",
+			     insn_offset, sreg, dreg);
+		pr_debug_type_name(&tsr->type, tsr->kind);
+		return;
+	}
 	/* Or try another register if any */
 	else if (src->multi_regs && src->reg1 != src->reg2 && sreg != src->reg2) {
 		sreg = src->reg2;
diff --git a/tools/perf/util/annotate-data.c b/tools/perf/util/annotate-data.c
index 6fa5cd373a46..b0bf0437168b 100644
--- a/tools/perf/util/annotate-data.c
+++ b/tools/perf/util/annotate-data.c
@@ -1252,11 +1252,33 @@ static enum type_match_result check_matching_type(struct type_state *state,
 	}
 
 	if (state->regs[reg].kind == TSR_KIND_PERCPU_BASE) {
-		u64 var_addr = dloc->op->offset;
+		u64 var_addr;
 		int var_offset;
 
 		pr_debug_dtp("percpu var");
 
+		if (arch__is_arm64(dloc->arch)) {
+			int reg2;
+
+			if (!dloc->op->multi_regs ||
+			    dloc->op->reg1 == dloc->op->reg2 || !retry)
+				return PERF_TMR_BAIL_OUT;
+
+			reg2 = dloc->op->reg2;
+			if (!has_reg_type(state, reg2) ||
+			    !state->regs[reg2].ok ||
+			    (state->regs[reg2].kind != TSR_KIND_GLOBAL_ADDR &&
+			     state->regs[reg2].kind != TSR_KIND_TYPE))
+				return PERF_TMR_NO_TYPE;
+
+			pr_debug_dtp(" : retry\n");
+			retry = false;
+			reg = reg2;
+			goto again;
+		}
+
+		var_addr = dloc->op->offset;
+
 		if (dloc->op->multi_regs) {
 			int reg2 = dloc->op->reg2;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 21/21] perf annotate-arm64: Support 'mrs' instruction to track 'current' pointer
  2026-07-01  3:53 [PATCH v3 00/21] perf arm64: Support data type profiling Tengda Wu
                   ` (19 preceding siblings ...)
  2026-07-01  3:53 ` [PATCH v3 20/21] perf annotate-arm64: Support per-cpu variable access tracking Tengda Wu
@ 2026-07-01  3:53 ` Tengda Wu
  2026-07-01  4:16   ` sashiko-bot
  20 siblings, 1 reply; 52+ messages in thread
From: Tengda Wu @ 2026-07-01  3:53 UTC (permalink / raw)
  To: Namhyung Kim, james.clark, xueshuai, Li Huafei
  Cc: Peter Zijlstra, leo.yan, Ian Rogers, Kim Phillips, Mark Rutland,
	Arnaldo Carvalho de Melo, Ingo Molnar, Bill Wendling,
	Nick Desaulniers, Alexander Shishkin, Adrian Hunter, Zecheng Li,
	linux-perf-users, linux-kernel, llvm, Tengda Wu

Extend update_insn_state() for arm64 to handle the 'mrs' instruction,
enabling the tracking of the 'current' task pointer in the kernel.

On arm64, the kernel uses the 'sp_el0' system register to store the
address of the currently executing 'struct task_struct'. This is
typically accessed via the 'get_current()' inline function, resulting
in the instruction 'mrs xN, sp_el0'.

To resolve the data type of the target register, first verify the
access is to 'sp_el0' within a kernel DSO. Then, locate the
'get_current()' inline function's DWARF Die at the current PC and
extract its return type (which is 'struct task_struct *').

Introduce a global variable 'task_struct_dieoff' to store the DWARF offset
of this type. This is particularly important because the compiler-generated
stack canary check code (which loads from 'current->stack_canary')
often exists in code sections or leaf functions where the local
Compilation Unit (CU) lacks a full 'struct task_struct' definition.
Caching the offset allows 'perf annotate' to consistently resolve
task-related fields across the entire kernel binary.

A real-world example is shown below:

  ffff8000800deee8 <kthread_blkcg>:
  ffff8000800deef0:  mrs  x0, sp_el0    // x0 = current
* ffff8000800deef4:  ldr  w1, [x0, #44]

Before this commit, the type flow starts with no information:

  chk [c] reg0 offset=0x2c ok=0 kind=0 cfa : no type information
  final result: no type information

After this commit, the tracker identifies the 'current' pointer
from the system register:

  mrs [8] sp_el0 -> reg0 type='struct task_struct*'
  chk [c] reg0 offset=0x2c ok=1 kind=1 (struct task_struct*) : Good!
  found by insn track: 0x2c(reg0) type-offset=0x2c
  final result: type='struct task_struct'

Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
---
 .../perf/util/annotate-arch/annotate-arm64.c  | 84 ++++++++++++++++++-
 1 file changed, 83 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
index ec6fd59d51a2..89e3997f6721 100644
--- a/tools/perf/util/annotate-arch/annotate-arm64.c
+++ b/tools/perf/util/annotate-arch/annotate-arm64.c
@@ -14,6 +14,8 @@
 #include "../debug.h"
 #include "../map.h"
 #include "../symbol.h"
+#include "../dso.h"
+#include "../strbuf.h"
 
 struct arch_arm64 {
 	struct arch arch;
@@ -787,6 +789,81 @@ static void update_adrp_insn_state(struct type_state *state,
 		     insn_offset, tsr->addr, dreg);
 }
 
+static Dwarf_Off task_struct_dieoff;
+
+static void update_mrs_insn_state(struct type_state *state,
+				  struct data_loc_info *dloc, Dwarf_Die *cu_die,
+				  struct disasm_line *dl,
+				  struct annotated_op_loc *dst)
+{
+	struct type_state_reg *tsr;
+	Dwarf_Die type_die;
+	u32 insn_offset = dl->al.offset;
+	int dreg = dst->reg1;
+	Dwarf_Die func_die;
+	Dwarf_Attribute attr;
+	u64 ip, pc;
+
+	if (!has_reg_type(state, dreg))
+		return;
+
+	tsr = &state->regs[dreg];
+	tsr->copied_from = -1;
+
+	/* Handle case difference: LLVM (SP_EL0) vs objdump (sp_el0) */
+	if (!dso__kernel(map__dso(dloc->ms->map)) ||
+	    strcasecmp(dl->ops.source.raw, "sp_el0")) {
+		invalidate_reg_state(tsr);
+		return;
+	}
+
+	ip = dloc->ms->sym->start + dl->al.offset;
+	pc = map__rip_2objdump(dloc->ms->map, ip);
+
+	if (!task_struct_dieoff ||
+	    !dwarf_offdie(dloc->di->dbg, task_struct_dieoff, &type_die)) {
+		struct strbuf sb;
+		char *type_name;
+		/*
+		 * Find the inline function 'get_current()' Dwarf_Die
+		 * and obtain its return value data type, which should
+		 * be 'struct task_struct*'.
+		 */
+		if (!die_find_inlinefunc(cu_die, pc, &func_die) ||
+			!dwarf_attr_integrate(&func_die, DW_AT_type, &attr) ||
+			!dwarf_formref_die(&attr, &type_die)) {
+			invalidate_reg_state(tsr);
+			return;
+		}
+
+		strbuf_init(&sb, 32);
+		die_get_typename_from_type(&type_die, &sb);
+		type_name = strbuf_detach(&sb, NULL);
+
+		if (!type_name || strcmp(type_name, "struct task_struct*")) {
+			invalidate_reg_state(tsr);
+			return;
+		}
+
+		/*
+		 * Cache the 'struct task_struct*' die offset globally.
+		 * This allows us to resolve stack canary accesses even
+		 * in CUs that lack a full task_struct definition (e.g.,
+		 * compiler-generated entry/exit code).
+		 */
+		task_struct_dieoff = dwarf_dieoffset(&type_die);
+	}
+
+	tsr->type = type_die;
+	tsr->kind = TSR_KIND_TYPE;
+	tsr->offset = 0;
+	tsr->addr = 0;
+	tsr->ok = true;
+
+	pr_debug_dtp("mrs [%x] sp_el0 -> reg%d", insn_offset, dreg);
+	pr_debug_type_name(&type_die, tsr->kind);
+}
+
 static void update_insn_state_arm64(struct type_state *state,
 				    struct data_loc_info *dloc, Dwarf_Die *cu_die,
 				    struct disasm_line *dl)
@@ -804,7 +881,7 @@ static void update_insn_state_arm64(struct type_state *state,
 	 * the destination register itself to prevent incorrect type propagation.
 	 */
 	if (has_reg_type(state, dst->reg1) &&
-	    strcmp(dl->ins.name, "adrp") &&
+	    strcmp(dl->ins.name, "mrs") && strcmp(dl->ins.name, "adrp") &&
 	    strcmp(dl->ins.name, "add") && strcmp(dl->ins.name, "mov") &&
 	    strncmp(dl->ins.name, "ld", 2) && strncmp(dl->ins.name, "st", 2)) {
 		pr_debug_dtp("%s [%x] invalidate reg%d\n",
@@ -813,6 +890,11 @@ static void update_insn_state_arm64(struct type_state *state,
 		return;
 	}
 
+	if (!strcmp(dl->ins.name, "mrs")) {
+		update_mrs_insn_state(state, dloc, cu_die, dl, dst);
+		return;
+	}
+
 	if (!strcmp(dl->ins.name, "adrp")) {
 		update_adrp_insn_state(state, dl, dst);
 		return;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 04/21] perf annotate-arm64: Generalize arm64_mov__parse to support more instructions
  2026-07-01  3:53 ` [PATCH v3 04/21] perf annotate-arm64: Generalize arm64_mov__parse to support more instructions Tengda Wu
@ 2026-07-01  4:03   ` sashiko-bot
  2026-07-01  6:57     ` Tengda Wu
  0 siblings, 1 reply; 52+ messages in thread
From: sashiko-bot @ 2026-07-01  4:03 UTC (permalink / raw)
  To: Tengda Wu; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] The parsing logic permanently corrupts the raw instruction string buffer (`ops->raw`) when restoring the `<` character after extracting a source symbol name.
--

commit 13742d589ccf5a86dcfd0991f541a8f1d46f7e0f
Author: Tengda Wu <wutengda@huaweicloud.com>

perf annotate-arm64: Generalize arm64_mov__parse to support more instructions

This commit generalizes arm64_mov__parse() to parse source and target operands
for a wider variety of arm64 instructions. It utilizes arm64__check_multi_regs()
to identify multiple registers while attempting to preserve parsing behavior for
instructions containing the addr <symbol> suffix.

> diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
> index 33080fdca125f..a1667237114a6 100644
> --- a/tools/perf/util/annotate-arch/annotate-arm64.c
> +++ b/tools/perf/util/annotate-arch/annotate-arm64.c

[ ... ]

> @@ -14,59 +16,129 @@
>  static int arm64_mov__parse(const struct arch *arch,
>  			    struct ins_operands *ops,
>  			    struct map_symbol *ms __maybe_unused,
>  			    struct disasm_line *dl __maybe_unused)
>  {

[ ... ]

> +	/* Parse a symbol followed by an address, if present */
>  	s = strchr(endptr, '<');
>  	if (s == NULL)
> -		goto out_free_target;
> -	endptr = strchr(s + 1, '>');
> +		return 0;
> +	endptr = strrchr(s + 1, '>');
>  	if (endptr == NULL)
> -		goto out_free_target;
> +		return 0;
>  
>  	*endptr = '\0';
>  	*s = ' ';
> -	ops->target.name = strdup(s);
> +	s = skip_spaces(++s);
> +	ops->source.name = strdup(s);
>  	*s = '<';
>  	*endptr = '>';

[Severity: Medium]
Does this code overwrite the first character of the symbol name?

Because s is advanced by skip_spaces(++s), s now points to the first character
of the symbol name. When *s = '<' is executed immediately after, it appears to
corrupt the ops->raw instruction string buffer instead of restoring the '<'
character at its original position. 

This could result in a visual glitch in the fallback raw instruction view
where the first character of the symbol name is overwritten (for example,
0x1000 <sym> becomes 0x1000  <ym>).

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260701035355.752944-1-wutengda@huaweicloud.com?part=4

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 11/21] perf auxtrace: Set default period to 1 for PERF_ITRACE_PERIOD_INSTRUCTIONS type
  2026-07-01  3:53 ` [PATCH v3 11/21] perf auxtrace: Set default period to 1 for PERF_ITRACE_PERIOD_INSTRUCTIONS type Tengda Wu
@ 2026-07-01  4:05   ` sashiko-bot
  0 siblings, 0 replies; 52+ messages in thread
From: sashiko-bot @ 2026-07-01  4:05 UTC (permalink / raw)
  To: Tengda Wu; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] The patch introduces a regression where multiple `--itrace` options silently overwrite a previously user-configured instruction period.
--

commit 37214515750cb3145373998c5f8b034630d80bf7
Author: Tengda Wu <wutengda@huaweicloud.com>

perf auxtrace: Set default period to 1 for PERF_ITRACE_PERIOD_INSTRUCTIONS type

This commit sets the default instruction period to 1 when a period is not 
explicitly specified for the PERF_ITRACE_PERIOD_INSTRUCTIONS type. This fixes 
an issue where a zero period resulted in zero percent values in perf annotate 
output.

> diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
> --- a/tools/perf/util/auxtrace.c
> +++ b/tools/perf/util/auxtrace.c
[ ... ]
> @@ -1759,6 +1759,12 @@ int itrace_do_parse_synth_opts(struct itrace_synth_opts *synth_opts,
>  			synth_opts->period = PERF_ITRACE_DEFAULT_PERIOD;
>  	}
>  
> +	if (!period_set &&
> +	    synth_opts->period_type == PERF_ITRACE_PERIOD_INSTRUCTIONS) {
> +		/* Indicates a sample is taken for every instruction. */
> +		synth_opts->period = 1;
> +	}
> +
>  	return 0;

[Severity: Medium]
Will this silently overwrite a user-configured instruction period if
multiple --itrace options are used?

Since period_set is a local variable initialized to false on every call to
itrace_do_parse_synth_opts(), if a user provides an explicit period in one
flag and later provides another flag without a period:

First call (e.g., --itrace=i1000i):
    period_set = true;
    synth_opts->period = 1000;

Second call (e.g., --itrace=M):
    period_set = false;
    // synth_opts->period is overwritten to 1 here

Could this lead to extreme performance overhead during decoding by
unexpectedly dropping the sampling period to 1?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260701035355.752944-1-wutengda@huaweicloud.com?part=11

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 03/21] perf llvm: Fix arm64 adrp instruction disassembly mismatch with objdump
  2026-07-01  3:53 ` [PATCH v3 03/21] perf llvm: Fix arm64 adrp instruction " Tengda Wu
@ 2026-07-01  4:05   ` sashiko-bot
  2026-07-01  6:45     ` Tengda Wu
  0 siblings, 1 reply; 52+ messages in thread
From: sashiko-bot @ 2026-07-01  4:05 UTC (permalink / raw)
  To: Tengda Wu; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Low] Missing explicit inclusion of <string.h> for POSIX string functions.
--

commit c3571a268605b8e2ad28ebe5943729438021e5fe
Author: Tengda Wu <wutengda@huaweicloud.com>

perf llvm: Fix arm64 adrp instruction disassembly mismatch with objdump

This commit addresses an inconsistency between libllvm and objdump when
parsing arm64 'adrp' instructions. It extracts the target address during
callbacks and formats it as an absolute address with symbol names in
symbol__disassemble_llvm().

> diff --git a/tools/perf/util/llvm.c b/tools/perf/util/llvm.c
> index a0deb742a7331..533d47e8084d7 100644
> --- a/tools/perf/util/llvm.c
> +++ b/tools/perf/util/llvm.c

[ ... ]

> @@ -227,6 +241,42 @@ int symbol__disassemble_llvm(const char *filename, struct symbol *sym,
>  				free(name);
>  			}
>  		}
> +		if (storage.pcrel_adrp_addr != 0) {
> +			/*
> +			 * ADRP (Address Page) instructions encode a 21-bit signed
> +			 * immediate offset relative to the current PC's page.
> +			 *
> +			 * To maintain consistency with standard objdump output,
> +			 * we truncate the raw encoded immediate at the comma
> +			 * and replace it with the resolved absolute page address.
> +			 *
> +			 * Example conversion:
> +			 * From: adrp x18, 8014
> +			 * To:   adrp x18, ffff800081f5f000 <this_cpu_vector>
> +			 */
> +			char *name;
> +			char *s = strchr(disasm_buf, ',');

[Severity: Low]
Does this file need to explicitly include <string.h>?

This patch introduces a new use of strchr() here in
symbol__disassemble_llvm(), but tools/perf/util/llvm.c lacks a direct
inclusion of <string.h> at the top of the file.

While this might compile under glibc due to implicit header inclusion,
missing explicit POSIX headers can cause build failures on systems
using musl libc, which enforces strict namespace separation.

> +
> +			if (s == NULL)
> +				goto err;
> +
> +			s++;
> +			*s = '\0';
> +			disasm_len = strlen(disasm_buf);

[ ... ]

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260701035355.752944-1-wutengda@huaweicloud.com?part=3

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 10/21] perf annotate: Deduplicate overlapping ARM SPE events for data type profiling
  2026-07-01  3:53 ` [PATCH v3 10/21] perf annotate: Deduplicate overlapping ARM SPE events for data type profiling Tengda Wu
@ 2026-07-01  4:06   ` sashiko-bot
  0 siblings, 0 replies; 52+ messages in thread
From: sashiko-bot @ 2026-07-01  4:06 UTC (permalink / raw)
  To: Tengda Wu; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Medium] Invalid union access causes severe type punning, false positive event dropping, and logic failure.
- [Medium] Unprotected dereference of potentially NULL `evsel->name` causing a segmentation fault.
--

commit 21f6f127e80c94aececac4065deebfce02da27b3
Author: Tengda Wu <wutengda@huaweicloud.com>

perf annotate: Deduplicate overlapping ARM SPE events for data type profiling

When data type profiling is enabled on ARM SPE, multiple overlapping
events are synthesized for a single sampled instruction. This patch
implements automatic deduplication in process_sample_event() by retaining
only the "instructions" event when data type annotation is enabled and no
explicit itrace options are configured.

> diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
> index 69cb72b2082a6..46b2bd3fb2013 100644
> --- a/tools/perf/builtin-annotate.c
> +++ b/tools/perf/builtin-annotate.c
> @@ -283,9 +283,22 @@ static int process_sample_event(const struct perf_tool *tool,
>  				struct machine *machine)
>  {
>  	struct perf_annotate *ann = container_of(tool, struct perf_annotate, tool);
> +	struct evsel *evsel = sample->evsel;
>  	struct addr_location al;
>  	int ret = 0;
>  
> +	/*
> +	 * ARM SPE synthesizes multiple overlapping events (e.g., l1d-miss,
> +	 * tlb-access) for a single sampled instruction. When data type is
> +	 * enabled, counting all of these overlapping events would inflate
> +	 * the profile and distort the data type statistics. To avoid this,
> +	 * deduplicate by retaining only the comprehensive "instructions" event.
> +	 */
> +	if (ann->data_type && !ann->session->itrace_synth_opts->set &&
> +	    event->auxtrace_info.type == PERF_AUXTRACE_ARM_SPE &&

[Severity: Medium]
Since event is a union of all perf events, does checking
event->auxtrace_info.type on a PERF_RECORD_SAMPLE event read the sample data
payload as if it were an auxtrace type?

If a normal event's first 32-bit payload (like PID 4 when PERF_SAMPLE_TID
is the first element) equals 4 (PERF_AUXTRACE_ARM_SPE), could this falsely
match and silently drop the event?

> +	    strcmp(evsel->name, "instructions") != 0)

[Severity: Medium]
Could evsel->name be NULL here if the event is unnamed or if we are
processing a perf.data file missing the HEADER_EVENT_DESC feature?

Passing NULL to strcmp() would cause a segmentation fault. Would it be safer
to use the evsel__name(evsel) wrapper instead?

> +		return 0;
> +
>  	addr_location__init(&al);

[ ... ]

> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
> index 552f063f126e6..e027e3b200bfb 100644
> --- a/tools/perf/util/arm-spe.c
> +++ b/tools/perf/util/arm-spe.c
> @@ -403,6 +403,7 @@ static void arm_spe_prep_sample(struct arm_spe *spe,
>  	event->sample.header.type = PERF_RECORD_SAMPLE;
>  	event->sample.header.misc = sample->cpumode;
>  	event->sample.header.size = sizeof(struct perf_event_header);
> +	event->auxtrace_info.type = PERF_AUXTRACE_ARM_SPE;

[Severity: Medium]
Writing to auxtrace_info.type modifies offset 8 of the event union, which
aliases the sample payload.

If trace injection is used, does perf_event__synthesize_sample() subsequently
overwrite this offset with the actual sample payload (such as sample->id),
erasing this magic value and breaking the deduplication entirely?

>  }

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260701035355.752944-1-wutengda@huaweicloud.com?part=10

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 08/21] perf annotate: Introduce extract_op_location callback for arch-specific parsing
  2026-07-01  3:53 ` [PATCH v3 08/21] perf annotate: Introduce extract_op_location callback for arch-specific parsing Tengda Wu
@ 2026-07-01  4:06   ` sashiko-bot
  2026-07-01  7:29     ` Tengda Wu
  0 siblings, 1 reply; 52+ messages in thread
From: sashiko-bot @ 2026-07-01  4:06 UTC (permalink / raw)
  To: Tengda Wu; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Medium] Refactoring changes the error propagation behavior for memory operands with unrecognized registers, causing valid instructions to be incorrectly marked as bad.
- [Low] Missing explicit inclusion of `<stdlib.h>` for the POSIX libc function `strtol()`, breaking musl compatibility.
--

commit a79e03a1004b442317df26b7478d0a1daed1e563
Author: Tengda Wu <wutengda@huaweicloud.com>

perf annotate: Introduce extract_op_location callback for arch-specific parsing

This patch refactors operand parsing by moving it from the central
annotate_get_insn_location() into architecture-specific callbacks. By
introducing extract_op_location callbacks for PowerPC and x86, it aims to
simplify the core logic and improve maintainability without making functional
changes.

> diff --git a/tools/perf/util/annotate-arch/annotate-x86.c b/tools/perf/util/annotate-arch/annotate-x86.c
> index 7e61365363938..6a4b4bdcd7a7c 100644
> --- a/tools/perf/util/annotate-arch/annotate-x86.c
> +++ b/tools/perf/util/annotate-arch/annotate-x86.c
> @@ -2,6 +2,7 @@
>  #include <string.h>
>  #include <linux/compiler.h>
>  #include <linux/zalloc.h>
> +#include <linux/ctype.h>
>  #include <assert.h>
>  #include <inttypes.h>
>  #include "../annotate-data.h"

[Severity: Low]
Does this file need an explicit include of <stdlib.h> for musl libc
compatibility?

It appears strtol() is used later in extract_op_location_x86() without
the required <stdlib.h> header being directly included here, which might
lead to implicit function declaration errors on musl-based systems.

[ ... ]

> @@ -809,6 +810,86 @@ static void update_insn_state_x86(struct type_state *state,
>  }
>  #endif
>  
> +/*
> + * Get register number and access offset from the given instruction.
> + * It assumes AT&T x86 asm format like OFFSET(REG).  Maybe it needs
> + * to revisit the format when it handles different architecture.
> + * Fills @reg and @offset when return 0.
> + */
> +static int extract_reg_offset(const struct arch *arch, const char *str,
> +			      struct annotated_op_loc *op_loc)
> +{

[ ... ]

> +static int extract_op_location_x86(const struct arch *arch,
> +				   struct disasm_line *dl __maybe_unused,
> +				   const char *op_str, int op_idx __maybe_unused,
> +				   struct annotated_op_loc *op_loc)
> +{
> +	const char *s = op_str;
> +	char *p = NULL;
> +
> +	if (op_str == NULL)
> +		return 0;
> +
> +	if (strchr(op_str, arch->objdump.memory_ref_char)) {
> +		op_loc->mem_ref = true;
> +		return extract_reg_offset(arch, op_str, op_loc);

[Severity: Medium]
Does returning the result of extract_reg_offset() here alter the error
propagation behavior?

In the previous annotate_get_insn_location() logic, the return value of
extract_reg_offset() was safely ignored:

        } else if (strchr(insn_str, arch->objdump.memory_ref_char)) {
                op_loc->mem_ref = true;
                op_loc->multi_regs = multi_regs;
                extract_reg_offset(arch, insn_str, op_loc);
        } else {

By returning it here, extract_op_location_x86() might return -1 if the
register is unrecognized. This regression is now propagated back to the
caller in annotate_get_insn_location():

        if (arch->extract_op_location) {
                ret = arch->extract_op_location(arch, dl, insn_str, i, op_loc);
                if (ret)
                        return ret;
        }

Could this cause valid instructions to be completely skipped and
incorrectly marked as bad?

> +	}
> +
> +	/* FIXME: Handle other segment registers */
> +	if (!strncmp(op_str, "%gs:", 4)) {
> +		op_loc->segment = INSN_SEG_X86_GS;
> +		op_loc->offset = strtol(op_str + 4,
> +					&p, 0);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260701035355.752944-1-wutengda@huaweicloud.com?part=8

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 06/21] perf dwarf-regs: Adapt get_dwarf_regnum() for arm64
  2026-07-01  3:53 ` [PATCH v3 06/21] perf dwarf-regs: Adapt get_dwarf_regnum() for arm64 Tengda Wu
@ 2026-07-01  4:07   ` sashiko-bot
  2026-07-01  7:14     ` Tengda Wu
  0 siblings, 1 reply; 52+ messages in thread
From: sashiko-bot @ 2026-07-01  4:07 UTC (permalink / raw)
  To: Tengda Wu; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [Medium] The zero registers ('wzr' and 'xzr') are incorrectly mapped to DWARF register 31.
- [Medium] The link register ('lr') alias is no longer recognized and will unexpectedly return -ENOENT.
- [Low] The functions strcmp and strtol are used without explicitly including <string.h> and <stdlib.h>.
--

commit cf10eca9678152514cc1354dc6997be3eec9f436
Author: Tengda Wu <wutengda@huaweicloud.com>

perf dwarf-regs: Adapt get_dwarf_regnum() for arm64

This commit replaces the static string table lookup for arm64 DWARF registers
with a programmatic implementation in __get_dwarf_regnum_arm64(). This allows
resolving register names directly into their corresponding DWARF numbers
during annotation.

> diff --git a/tools/perf/util/dwarf-regs-arch/dwarf-regs-arm64.c b/tools/perf/util/dwarf-regs-arch/dwarf-regs-arm64.c
> index 593ca7d4fccca..be55fc2a4f385 100644
> --- a/tools/perf/util/dwarf-regs-arch/dwarf-regs-arm64.c
> +++ b/tools/perf/util/dwarf-regs-arch/dwarf-regs-arm64.c
> @@ -1,5 +1,6 @@
>  // SPDX-License-Identifier: GPL-2.0
>  #include <errno.h>
> +#include <ctype.h>

[Severity: Low]
Does omitting the explicit includes for <string.h> and <stdlib.h> break the
build on musl libc? 

This file now uses strcmp() and strtol() below, and musl strictly separates
declarations without relying on implicit inclusions.

>  #include <dwarf-regs.h>
>  #include "../../../arch/arm64/include/uapi/asm/perf_regs.h"

[ ... ]

> +int __get_dwarf_regnum_arm64(const char *name)
> +{
> +	int reg;
> +
> +	if (!strcmp(name, "sp") || !strcmp(name, "wzr") || !strcmp(name, "xzr"))
> +		return 31;

[Severity: Medium]
Does this incorrectly map the zero registers (wzr and xzr) to DWARF
register 31?

The AArch64 DWARF ABI assigns register 31 strictly to the stack pointer (sp),
and zero registers have no valid DWARF mapping.

Mapping wzr and xzr to 31 might cause perf annotate to conflate operations on
the zero register with stack pointer operations, leading to incorrect stack
variable tracking and type profiling statistics.

> +
> +	if (*name != 'x' && *name != 'w')
> +		return -ENOENT;

[Severity: Medium]
Will this strict prefix check reject the link register alias lr?

The previous static table lookup correctly mapped lr to register 30.
Because lr does not start with x or w, it appears this will now return
-ENOENT, which might break data type tracking for instructions that operate
on the link register.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260701035355.752944-1-wutengda@huaweicloud.com?part=6

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 05/21] perf annotate-arm64: Handle load and store instructions
  2026-07-01  3:53 ` [PATCH v3 05/21] perf annotate-arm64: Handle load and store instructions Tengda Wu
@ 2026-07-01  4:07   ` sashiko-bot
  2026-07-01  7:03     ` Tengda Wu
  0 siblings, 1 reply; 52+ messages in thread
From: sashiko-bot @ 2026-07-01  4:07 UTC (permalink / raw)
  To: Tengda Wu; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Medium] arm64_ldst__parse fails to parse PC-relative load instructions, breaking operand extraction for literal pool loads.
- [Medium] Failure to strip inline comments before parsing operands allows comments to corrupt `multi_regs` evaluation.
--

commit 9939a67ebc615cf049eb33c68f83aa3646abb3f0
Author: Tengda Wu <wutengda@huaweicloud.com>

perf annotate-arm64: Handle load and store instructions

This commit adds ldst_ops to parse data types and offsets for ARM64 memory
access instructions using regex prefix matching.

> diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
> index a1667237114a6..451500def8c12 100644
> --- a/tools/perf/util/annotate-arch/annotate-arm64.c
> +++ b/tools/perf/util/annotate-arch/annotate-arm64.c
[ ... ]
> @@ -141,6 +142,104 @@ static const struct ins_ops arm64_mov_ops = {
[ ... ]
> +static int arm64_ldst__parse(const struct arch *arch,
> +			     struct ins_operands *ops,
> +			     struct map_symbol *ms __maybe_unused,
> +			     struct disasm_line *dl)
> +{
> +	char *s, *left, *right;
> +
> +	right = s = strchr(ops->raw, arch->objdump.memory_ref_char);
> +	if (!s)
> +		return -1;

[Severity: Medium]
Will this fail to parse PC-relative load instructions?

When processing a PC-relative load instruction (e.g., ldr x0, 4005c0 <sym>),
the 'ld' prefix matches the new ldst_insn regex in arch__new_arm64() and
routes it to arm64_ldst__parse().

Since PC-relative loads lack a bracket ('['), strchr() will return NULL,
causing the function to immediately return -1. Does this leave operands
unpopulated and break annotation features like data type tracking for
literal pool loads?

> +
> +	while (s > ops->raw && *s != ',')
> +		--s;
> +
> +	if (s == ops->raw)
> +		return -1;
> +
> +	*s = '\0';
> +	left = strdup(ops->raw);
> +
> +	*s = ',';
> +	if (!left)
> +		return -1;
> +
> +	right = strdup(right);
> +	if (!right) {
> +		zfree(&left);
> +		return -1;
> +	}
> +
> +	if (arm64__insn_is_source_first(dl)) {
> +		ops->source.raw = left;
> +		ops->source.mem_ref = false;
> +
> +		ops->target.raw = right;
> +		ops->target.mem_ref = true;
> +		ops->target.multi_regs = arm64__check_multi_regs(arch, ops->target.raw);

[Severity: Medium]
Does this code incorrectly include inline comments when evaluating multi_regs?

The string duplicated into 'right' includes trailing comments from the raw
operand string because they are not stripped prior to parsing.

If objdump outputs an inline comment containing a comma and a string
resembling a register name (for example: // read, x2), could
arm64__check_multi_regs() wrongly flag the operand as having multiple
registers and corrupt data type profiling?

> +	} else {
> +		ops->source.raw = right;
> +		ops->source.mem_ref = true;
> +		ops->source.multi_regs = arm64__check_multi_regs(arch, ops->source.raw);
> +
> +		ops->target.raw = left;
> +		ops->target.mem_ref = false;
> +	}
> +
> +	return 0;
> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260701035355.752944-1-wutengda@huaweicloud.com?part=5

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 02/21] perf capstone: Fix arm64 jump/adrp disassembly mismatch with objdump
  2026-07-01  3:53 ` [PATCH v3 02/21] perf capstone: Fix arm64 jump/adrp disassembly mismatch with objdump Tengda Wu
@ 2026-07-01  4:07   ` sashiko-bot
  2026-07-01  6:44     ` Tengda Wu
  0 siblings, 1 reply; 52+ messages in thread
From: sashiko-bot @ 2026-07-01  4:07 UTC (permalink / raw)
  To: Tengda Wu; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [High] Symbol resolution fails for relocated kernels (KASLR) and user-space shared libraries due to mismatched address spaces.
- [Low] Build failure due to undeclared `machine` variable and undefined `machine__normalized_is()` function in `capstone_init()`.
- [Low] Missing `<stdlib.h>` inclusion causes musl libc build failure.
- [Low] Usage of `ARM64_GRP_BRANCH_RELATIVE` causes build failures with Capstone versions prior to 5.0.
--

commit 1f9744f04e456b1209677658edb7f30a5b26e56d
Author: Tengda Wu <wutengda@huaweicloud.com>

perf capstone: Fix arm64 jump/adrp disassembly mismatch with objdump

The patch implements extended formatting for arm64 jump and adrp instructions
during symbol__disassemble_capstone() to match objdump's expected output style.
It adds instruction detail resolution in capstone_init() and formats the
output with correct symbol and offset mappings.

> diff --git a/tools/perf/util/capstone.c b/tools/perf/util/capstone.c
> index 9bba78ee0c5a2..a9c0a1d2eb036 100644
> --- a/tools/perf/util/capstone.c
> +++ b/tools/perf/util/capstone.c

[ ... ]

> @@ -218,6 +218,12 @@ static int capstone_init(uint16_t e_machine, csh *cs_handle, bool is64, bool is_
>  		 * on x86 by investigating instruction details.
>  		 */
>  		perf_cs_option(*cs_handle, CS_OPT_DETAIL, CS_OPT_ON);
> +	} else if (machine__normalized_is(machine, "arm64")) {

[Severity: Low]
Does this code cause a build failure? The variable machine is not declared
in capstone_init(), and the function machine__normalized_is() does not appear
to be defined in the codebase.

> +		/*
> +		 * Same as x86: arm64 needs instruction details to resolve
> +		 * symbolic addresses
> +		 */
> +		perf_cs_option(*cs_handle, CS_OPT_DETAIL, CS_OPT_ON);
>  	}
>  
>  	return 0;

[ ... ]

> +static bool is_pc_relative_insn(struct cs_insn *insn)
> +{
> +	int i;
> +
> +	if (insn->id == ARM64_INS_ADR || insn->id == ARM64_INS_ADRP)
> +		return true;
> +
> +	if (insn->detail == NULL)
> +		return false;
> +
> +	for (i = 0; i < insn->detail->groups_count; i++) {
> +		if (insn->detail->groups[i] == ARM64_GRP_JUMP ||
> +		    insn->detail->groups[i] == ARM64_GRP_CALL ||
> +		    insn->detail->groups[i] == ARM64_GRP_BRANCH_RELATIVE)

[Severity: Low]
Will this break compilation against Capstone versions prior to 5.0?
ARM64_GRP_BRANCH_RELATIVE was introduced in Capstone 5.0. Since the perf
tool's build feature detection doesn't mandate a specific Capstone version,
relying on this enum might cause build failures on older setups.

> +			return true;
> +	}
> +
> +	return false;
> +}
> +
> +static void format_capstone_insn_arm64(struct cs_insn *insn, char *buf,
> +				       size_t len, struct annotate_args *args,
> +				       u64 addr)
> +{

[ ... ]

> +	if (is_pc_relative_insn(insn)) {
> +		/* Extract last immediate value as address */
> +		last_imm = strrchr(buf, '#');
> +		if (!last_imm)
> +			return;
> +
> +		orig_addr = strtoull(last_imm + 1, &endptr, 16);

[Severity: Low]
Can this cause a build failure on musl libc? The POSIX standard function
strtoull() is used here, but stdlib.h is not explicitly included in the
file. Relying on implicit inclusions often works on glibc but breaks on
musl libc.

> +		if (endptr == last_imm + 1)
> +			return;
> +
> +		/* Relocate map that contains the address */
> +		if (dso__kernel(map__dso(map))) {
> +			found_map = maps__find(map__kmaps(map), orig_addr);
> +			if (found_map == NULL)
> +				return;
> +			map = found_map;
> +		}
> +
> +		/* Convert it to map-relative address for search */
> +		addr = map__map_ip(map, orig_addr);

[Severity: High]
Does this code fail to resolve symbols for relocated kernels (KASLR) or
user-space shared libraries (PIE)?

The extracted orig_addr is in the objdump address space. However,
maps__find() and map__map_ip() expect absolute memory addresses. By omitting
the translation step via map__objdump_2mem(map, orig_addr) (which the x86
implementation handles correctly), KASLR relocated kernels will fail in
maps__find(), and user-space shared libraries will underflow in
map__map_ip().

> +
> +		sym = map__find_symbol(map, addr);
> +		if (sym == NULL) {
> +			map__put(found_map);
> +			return;
> +		}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260701035355.752944-1-wutengda@huaweicloud.com?part=2

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 09/21] perf annotate-arm64: Implement extract_op_location() callback
  2026-07-01  3:53 ` [PATCH v3 09/21] perf annotate-arm64: Implement extract_op_location() callback Tengda Wu
@ 2026-07-01  4:10   ` sashiko-bot
  2026-07-01  7:36     ` Tengda Wu
  0 siblings, 1 reply; 52+ messages in thread
From: sashiko-bot @ 2026-07-01  4:10 UTC (permalink / raw)
  To: Tengda Wu; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] Post-index memory references with register offsets are incorrectly classified as SIGNED_OFFSET.
--

commit 1b03950d2edd24d57bf3a7ad6271aea1980c92e3
Author: Tengda Wu <wutengda@huaweicloud.com>

perf annotate-arm64: Implement extract_op_location() callback

Implement the extract_op_location() callback for the arm64 architecture
to handle its specific assembly syntax and addressing modes, enabling data
type profiling for this architecture.

> diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
> index 451500def8c12..77b6f4841c849 100644
> --- a/tools/perf/util/annotate-arch/annotate-arm64.c
> +++ b/tools/perf/util/annotate-arch/annotate-arm64.c
[ ... ]
> +static int extract_op_location_arm64(const struct arch *arch,
> +				     struct disasm_line *dl __maybe_unused,
> +				     const char *op_str, int op_idx __maybe_unused,
> +				     struct annotated_op_loc *op_loc)
> +{
> +	const char *s = op_str;
> +	char *p = NULL;
> +
> +	if (op_str == NULL)
> +		return 0;
[ ... ]
> +	/*
> +	 * Handle memory references (e.g., [x0, #8]), identify
> +	 * arm64 specific addressing modes
> +	 */
> +	if (*s == arch->objdump.memory_ref_char) {
> +		op_loc->mem_ref = true;
> +
> +		p = strchr(s, ']');
> +		if (p == NULL)
> +			return -1;
> +
> +		/* Pre-index: [base, #imm]! */
> +		if (p[1] == '!')
> +			op_loc->addr_mode = PERF_ADDR_MODE_PRE_INDEX;
> +		/* Post-index: [base], #imm */
> +		else if (p[1] == ',' && strchr(p + 1, arch->objdump.imm_char))
> +			op_loc->addr_mode = PERF_ADDR_MODE_POST_INDEX;

[Severity: Medium]
Does this incorrectly restrict post-index addressing classification to only
immediate offsets?

In ARM64, a comma immediately following the closing bracket (p[1] == ',') is
sufficient to identify a post-index addressing mode, regardless of whether
the offset is an immediate or a register (for example, Advanced SIMD
instructions like ld1 {v0.16b}, [x2], x5).

If the condition fails for register offsets, it appears it will incorrectly
fall through to PERF_ADDR_MODE_SIGNED_OFFSET. Could this cause the base
register write-back to be silently ignored in adjust_reg_index_state(),
leading to stale and corrupted type state tracking for the base register?

> +		/* Signed offset: [base{, #imm}] */
> +		else
> +			op_loc->addr_mode = PERF_ADDR_MODE_SIGNED_OFFSET;
> +
> +		s++;
> +	}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260701035355.752944-1-wutengda@huaweicloud.com?part=9

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 13/21] perf annotate-arm64: Enable instruction tracking support
  2026-07-01  3:53 ` [PATCH v3 13/21] perf annotate-arm64: Enable instruction tracking support Tengda Wu
@ 2026-07-01  4:12   ` sashiko-bot
  2026-07-01  7:56     ` Tengda Wu
  0 siblings, 1 reply; 52+ messages in thread
From: sashiko-bot @ 2026-07-01  4:12 UTC (permalink / raw)
  To: Tengda Wu; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:

New issues:
- [High] Missing function call register invalidation for ARM64 data type tracking.
- [Medium] Failure to invalidate secondary destination registers (`reg2`) in multi-register instructions.

Pre-existing issues:
- [High] Missing `multi_regs` evaluation for register operands in load/store instructions.
--

commit a1f858405ccf3b88519235d7e08cb42762d969ac
Author: Tengda Wu <wutengda@huaweicloud.com>

perf annotate-arm64: Enable instruction tracking support

Enable basic instruction tracking for arm64 by implementing three essential
functions in the find_data_type_block() call path: arch_supports_insn_tracking,
init_type_state, and update_insn_state.

> diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
> index 77b6f4841c849..48b1c94273eb6 100644
> --- a/tools/perf/util/annotate-arch/annotate-arm64.c
> +++ b/tools/perf/util/annotate-arch/annotate-arm64.c

[ ... ]

> @@ -324,6 +326,31 @@ static int extract_op_location_arm64(const struct arch *arch,
>  	return 0;
>  }
>  
> +#ifdef HAVE_LIBDW_SUPPORT
> +static void update_insn_state_arm64(struct type_state *state,
> +				    struct data_loc_info *dloc, Dwarf_Die *cu_die __maybe_unused,
> +				    struct disasm_line *dl)
> +{
> +	struct annotated_insn_loc loc;
> +	struct annotated_op_loc *dst = &loc.ops[INSN_OP_TARGET];
> +	u32 insn_offset = dl->al.offset;
> +
> +	if (annotate_get_insn_location(dloc->arch, dl, &loc) < 0)
> +		return;
> +

[Severity: High]
Does this function miss a check for function calls?

When an instruction is a function call (e.g., bl or blr), this code does not
check ins__is_call(&dl->ins). As a result, it fails to invalidate the
caller-saved registers (x0-x17) and update the return type in x0.

While init_type_state() correctly initializes the caller_saved flags for
these registers, won't omitting the invalidation here cause subsequent
accesses to inherit incorrect type states across function calls?

> +	/*
> +	 * For unsupported instructions with a destination register, invalidate
> +	 * the destination register itself to prevent incorrect type propagation.
> +	 */
> +	if (has_reg_type(state, dst->reg1)) {
> +		pr_debug_dtp("%s [%x] invalidate reg%d\n",
> +			     dl->ins.name, insn_offset, dst->reg1);
> +		invalidate_reg_state(&state->regs[dst->reg1]);
> +		return;
> +	}

[Severity: Medium]
Does this fallback logic need to handle secondary destination registers?

When an unsupported instruction modifies multiple registers (e.g., casp),
this only checks and invalidates the primary destination register
(dst->reg1). 

If the instruction has a secondary register (where dst->multi_regs is
true), should we also invalidate dst->reg2 so it doesn't retain an
incorrect type state?

[Severity: High]
This is a pre-existing issue, but I noticed a potential problem in
arm64_ldst__parse() regarding multi-register instructions:

tools/perf/util/annotate-arch/annotate-arm64.c:arm64_ldst__parse() {
...
	if (arm64__insn_is_source_first(dl)) {
		ops->source.raw = left;
		ops->source.mem_ref = false;
		ops->target.raw = right;
		ops->target.mem_ref = true;
		ops->target.multi_regs = arm64__check_multi_regs(arch, ops->target.raw);
	}
...
}

Here, arm64__check_multi_regs() is only applied to the memory operand side
(ops->target in this case). The register operand side is left with
multi_regs = false. 

Because of this, extract_op_location_arm64() fails to parse the second
register (reg2) for instructions like ldp and stp, leaving it as -1. Won't
this cause the data type tracking logic to silently miss type updates and
queries for the secondary register?

> +}
> +#endif

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260701035355.752944-1-wutengda@huaweicloud.com?part=13

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 14/21] perf annotate-arm64: Support load instruction tracking
  2026-07-01  3:53 ` [PATCH v3 14/21] perf annotate-arm64: Support load instruction tracking Tengda Wu
@ 2026-07-01  4:14   ` sashiko-bot
  2026-07-01  8:37     ` Tengda Wu
  0 siblings, 1 reply; 52+ messages in thread
From: sashiko-bot @ 2026-07-01  4:14 UTC (permalink / raw)
  To: Tengda Wu; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Medium] Missing invalidation of the destination register on type resolution failure during load instruction tracking.
- [Medium] The data type profiling logic ignores the second destination register of load pair (ldp) instructions, leading to stale state retention.
--

commit 579f01582f34ef50d993c4cb489f025353f8dbcc
Author: Tengda Wu <wutengda@huaweicloud.com>

perf annotate-arm64: Support load instruction tracking

This commit extends update_insn_state_arm64() to handle LDR instructions.
It tracks register state changes when data is loaded from memory to registers,
supporting signed offset, pre-index, and post-index addressing modes.

> diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
> index 48b1c94273eb6..b48545f35620c 100644
> --- a/tools/perf/util/annotate-arch/annotate-arm64.c
> +++ b/tools/perf/util/annotate-arch/annotate-arm64.c

[ ... ]

> +static void update_load_insn_state(struct type_state *state,
> +				   struct disasm_line *dl,
> +				   struct annotated_op_loc *src,
> +				   struct annotated_op_loc *dst)
> +{
> +	struct type_state_reg *tsr;
> +	struct type_state_reg src_tsr;
> +	Dwarf_Die type_die;
> +	u32 insn_offset = dl->al.offset;
> +	int reg_offset;
> +	int sreg = src->reg1;
> +	int dreg = dst->reg1;

[Severity: Medium]
Does this ignore the second destination register for load pair (ldp)
instructions?

If the instruction writes to two destination registers (e.g.,
ldp x0, x1, [x2]), it seems dst->reg2 is not checked or invalidated here,
which could leave it with stale type information. Does the parsing logic in
arm64_ldst__parse() also need to set ops->target.multi_regs for this to be
properly tracked?

> +
> +	if (!has_reg_type(state, dreg))
> +		goto out_adjust;
> +
> +	tsr = &state->regs[dreg];
> +	tsr->copied_from = -1;
> +
> +retry:
> +	if (!has_reg_type(state, sreg) || !state->regs[sreg].ok) {
> +		invalidate_reg_state(tsr);
> +		return;
> +	}
> +
> +	src_tsr = state->regs[sreg];
> +	reg_offset = get_reg_index_offset(src);
> +
> +	/* Dereference the pointer if it has one */
> +	if (src_tsr.kind == TSR_KIND_TYPE &&
> +	    die_deref_ptr_type(&src_tsr.type,
> +			       src_tsr.offset + reg_offset, &type_die)) {
> +		tsr->type = type_die;
> +		tsr->kind = TSR_KIND_TYPE;
> +		tsr->offset = 0;
> +		tsr->ok = true;
> +
> +		if (src->multi_regs) {
> +			pr_debug_dtp("ldr [%x] %#x(reg%d, reg%d) -> reg%d",
> +				     insn_offset, reg_offset, src->reg1,
> +				     src->reg2, dreg);
> +		} else {
> +			pr_debug_dtp("ldr [%x] %#x(reg%d) -> reg%d",
> +				     insn_offset, reg_offset, sreg, dreg);
> +		}
> +		pr_debug_type_name(&tsr->type, tsr->kind);
> +	}
> +	/* Or try another register if any */
> +	else if (src->multi_regs && src->reg1 != src->reg2 && sreg != src->reg2) {
> +		sreg = src->reg2;
> +		goto retry;
> +	}

[Severity: Medium]
If the type resolution fails and execution bypasses both of these if/else
blocks, it falls through to out_adjust.

Should the destination register be invalidated here?

It looks like tsr->ok could remain true from its previous state, leaving it
with incorrect type information.

> +
> +out_adjust:
> +	adjust_reg_index_state(state, src, "ldr", insn_offset);
> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260701035355.752944-1-wutengda@huaweicloud.com?part=14

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 19/21] perf annotate-arm64: Support 'adrp' instruction to track global variables
  2026-07-01  3:53 ` [PATCH v3 19/21] perf annotate-arm64: Support 'adrp' instruction to track global variables Tengda Wu
@ 2026-07-01  4:15   ` sashiko-bot
  2026-07-01  8:48     ` Tengda Wu
  0 siblings, 1 reply; 52+ messages in thread
From: sashiko-bot @ 2026-07-01  4:15 UTC (permalink / raw)
  To: Tengda Wu; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] Stale `type` field in register state causes `TSR_KIND_GLOBAL_ADDR` to be corrupted across stack spills.
--

commit ddfb698eef878cf476210d72e082d8b79c811ecd
Author: Tengda Wu <wutengda@huaweicloud.com>

perf annotate-arm64: Support 'adrp' instruction to track global variables

This patch extends update_insn_state() for arm64 to track global variable
types calculated via page-relative addressing. It introduces
TSR_KIND_GLOBAL_ADDR to represent the partial page-aligned target address
produced by adrp instructions. The address is later combined with the offset
from a subsequent add or ldr instruction to resolve the variable's type.

> diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
> index 1fed188117192..6f96e75d313dd 100644
> --- a/tools/perf/util/annotate-arch/annotate-arm64.c
> +++ b/tools/perf/util/annotate-arch/annotate-arm64.c
[ ... ]
> @@ -656,8 +700,37 @@ static void update_add_insn_state(struct type_state *state,
>  	}
>  }
>  
> +static void update_adrp_insn_state(struct type_state *state,
> +				   struct disasm_line *dl,
> +				   struct annotated_op_loc *dst)
> +{
> +	struct type_state_reg *tsr;
> +	u32 insn_offset = dl->al.offset;
> +	int dreg = dst->reg1;
> +
> +	if (!has_reg_type(state, dreg))
> +		return;
> +
> +	tsr = &state->regs[dreg];
> +	tsr->copied_from = -1;
> +
> +	if (!dl->ops.source.addr) {
> +		invalidate_reg_state(tsr);
> +		return;
> +	}
> +
> +	tsr->kind = TSR_KIND_GLOBAL_ADDR;
> +	/* Partial page-relative address, finalized in next 'add/ldr' */
> +	tsr->addr = dl->ops.source.addr;
> +	tsr->offset = 0;
> +	tsr->ok = true;

[Severity: Medium]
Does this leave tsr->type containing a stale Dwarf_Die?

If the register previously held a structure or union type, the type field
isn't cleared here when switching to TSR_KIND_GLOBAL_ADDR.

If this register is later spilled to the stack, set_stack_state() will
evaluate the stale type_die. Since the type_die tag is DW_TAG_structure_type,
it will erroneously set stack->compound = true.

When the register is loaded back via ldr, update_load_insn_state() will then
branch into the compound type handling (die_get_member_type()) instead of
restoring the TSR_KIND_GLOBAL_ADDR state. 

Could this destroy the tracking state and cause perf annotate to fail resolving
the global variable's type?

> +
> +	pr_debug_dtp("adrp [%x] global addr=%#"PRIx64" -> reg%d\n",
> +		     insn_offset, tsr->addr, dreg);
> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260701035355.752944-1-wutengda@huaweicloud.com?part=19

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 21/21] perf annotate-arm64: Support 'mrs' instruction to track 'current' pointer
  2026-07-01  3:53 ` [PATCH v3 21/21] perf annotate-arm64: Support 'mrs' instruction to track 'current' pointer Tengda Wu
@ 2026-07-01  4:16   ` sashiko-bot
  2026-07-01  8:56     ` Tengda Wu
  0 siblings, 1 reply; 52+ messages in thread
From: sashiko-bot @ 2026-07-01  4:16 UTC (permalink / raw)
  To: Tengda Wu; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Medium] Memory leak of the dynamically allocated `type_name` string obtained from `strbuf_detach()`.
- [Medium] Cross-DSO contamination and potential type confusion due to globally caching a DWARF offset (`task_struct_dieoff`).
--

commit 0f7d6f3c161448b3cb967b47ee3887be3d974dc1
Author: Tengda Wu <wutengda@huaweicloud.com>

perf annotate-arm64: Support 'mrs' instruction to track 'current' pointer

This commit adds support for parsing the 'mrs' instruction on arm64 in
'perf annotate' to track the 'current' task pointer. It dynamically queries
the DWARF debug information for the 'get_current()' inline function's
return type to find 'struct task_struct', caching the DWARF offset for
subsequent lookups.

> diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
> index ec6fd59d51a27..89e3997f6721b 100644
> --- a/tools/perf/util/annotate-arch/annotate-arm64.c
> +++ b/tools/perf/util/annotate-arch/annotate-arm64.c
[ ... ]
> @@ -787,6 +789,81 @@ static void update_adrp_insn_state(struct type_state *state,
>  		     insn_offset, tsr->addr, dreg);
>  }
>  
> +static Dwarf_Off task_struct_dieoff;
> +
> +static void update_mrs_insn_state(struct type_state *state,
> +				  struct data_loc_info *dloc, Dwarf_Die *cu_die,
> +				  struct disasm_line *dl,
> +				  struct annotated_op_loc *dst)
> +{
[ ... ]
> +	/* Handle case difference: LLVM (SP_EL0) vs objdump (sp_el0) */
> +	if (!dso__kernel(map__dso(dloc->ms->map)) ||
> +	    strcasecmp(dl->ops.source.raw, "sp_el0")) {
> +		invalidate_reg_state(tsr);
> +		return;
> +	}
> +
> +	ip = dloc->ms->sym->start + dl->al.offset;
> +	pc = map__rip_2objdump(dloc->ms->map, ip);
> +
> +	if (!task_struct_dieoff ||
> +	    !dwarf_offdie(dloc->di->dbg, task_struct_dieoff, &type_die)) {

[Severity: Medium]
Since DWARF offsets are local to their .debug_info section, if perf annotate
processes a symbol in a different kernel module (where dso__kernel() is still
true), couldn't this global task_struct_dieoff be applied to the module's
Dwarf * handle?

Would this result in type confusion or returning an arbitrary, unrelated DIE
across different DSOs?

> +		struct strbuf sb;
> +		char *type_name;
> +		/*
> +		 * Find the inline function 'get_current()' Dwarf_Die
> +		 * and obtain its return value data type, which should
> +		 * be 'struct task_struct*'.
> +		 */
> +		if (!die_find_inlinefunc(cu_die, pc, &func_die) ||
> +			!dwarf_attr_integrate(&func_die, DW_AT_type, &attr) ||
> +			!dwarf_formref_die(&attr, &type_die)) {
> +			invalidate_reg_state(tsr);
> +			return;
> +		}
> +
> +		strbuf_init(&sb, 32);
> +		die_get_typename_from_type(&type_die, &sb);
> +		type_name = strbuf_detach(&sb, NULL);
> +
> +		if (!type_name || strcmp(type_name, "struct task_struct*")) {
> +			invalidate_reg_state(tsr);
> +			return;
> +		}

[Severity: Medium]
Does this code leak the type_name string?

strbuf_detach() allocates and transfers ownership of the string to type_name,
but it appears to never be freed on either the success path or the error path
returning here.

> +
> +		/*
> +		 * Cache the 'struct task_struct*' die offset globally.
> +		 * This allows us to resolve stack canary accesses even
> +		 * in CUs that lack a full task_struct definition (e.g.,
> +		 * compiler-generated entry/exit code).
> +		 */
> +		task_struct_dieoff = dwarf_dieoffset(&type_die);
> +	}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260701035355.752944-1-wutengda@huaweicloud.com?part=21

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 16/21] perf annotate-arm64: Support stack variable tracking
  2026-07-01  3:53 ` [PATCH v3 16/21] perf annotate-arm64: Support stack variable tracking Tengda Wu
@ 2026-07-01  4:16   ` sashiko-bot
  0 siblings, 0 replies; 52+ messages in thread
From: sashiko-bot @ 2026-07-01  4:16 UTC (permalink / raw)
  To: Tengda Wu; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [Medium] Incomplete handling of ARM64 load/store pair instructions (ldp/stp) causes type tracking corruption by ignoring the second register.
- [Medium] Missing invalidation of destination register when type resolution fails during a memory load.
- [Medium] Missing invalidation of stack state when storing an unknown register type.
- [Medium] Incorrect stack offset calculation when accessing memory via the stack pointer while a different frame base register is active.
--

commit 3e734515747646c2063edf487f47005c1433cf1a
Author: Tengda Wu <wutengda@huaweicloud.com>

perf annotate-arm64: Support stack variable tracking

This commit extends update_insn_state() for arm64 to track data types
stored on the stack. It handles stack loads and stores to update the
register's type state or stack state. This allows 'perf annotate' to
maintain type information for local variables as they move between registers
and memory.

> diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
> index f283596bb0589..255ccaf331309 100644
> --- a/tools/perf/util/annotate-arch/annotate-arm64.c
> +++ b/tools/perf/util/annotate-arch/annotate-arm64.c
[ ... ]
> @@ -368,6 +371,8 @@ static void update_load_insn_state(struct type_state *state,
>  	int reg_offset;
>  	int sreg = src->reg1;
>  	int dreg = dst->reg1;
        ^^^^^^^^^^^^^^^^^^^^
[Severity: Medium]
When handling ARM64 load pair instructions (ldp), does this code ignore the
second destination register (dst->reg2)? The second register of a load pair
could retain its stale type instead of being updated, which might lead to
false type propagation.

> +	int fbreg = dloc->fbreg;
> +	int fboff = 0;
>  
>  	if (!has_reg_type(state, dreg))
>  		goto out_adjust;
> @@ -375,7 +380,52 @@ static void update_load_insn_state(struct type_state *state,
>  	tsr = &state->regs[dreg];
>  	tsr->copied_from = -1;
>  
> +	if (dloc->fb_cfa) {
> +		u64 ip = dloc->ms->sym->start + dl->al.offset;
> +		u64 pc = map__rip_2objdump(dloc->ms->map, ip);
> +
> +		if (die_get_cfa(dloc->di->dbg, pc, &fbreg, &fboff) < 0)
> +			fbreg = -1;
> +	}
> +
>  retry:
> +	/* Check stack variables with offset */
> +	if (sreg == fbreg || sreg == state->stack_reg) {
> +		struct type_state_stack *stack;
> +		int offset = src->offset - fboff;
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Severity: Medium]
Will this subtract the frame base offset even when the access is relative to
the stack pointer (when sreg == state->stack_reg)? If the stack pointer is
used and fbreg is different (e.g., x29), this could yield an incorrect
internal stack offset and lose type information.

> +
> +		stack = find_stack_state(state, offset);
[ ... ]
> @@ -413,6 +463,70 @@ static void update_load_insn_state(struct type_state *state,
>  	adjust_reg_index_state(state, src, "ldr", insn_offset);
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Severity: Medium]
Is the destination register left uninvalidated if type resolution fails
during a memory load and the code falls through the multi-regs retry logic
to the out_adjust label? The destination register could retain its prior type
instead of being invalidated, even though its value was overwritten by an
unknown memory load.

>  }
>  
> +static void update_store_insn_state(struct type_state *state,
> +				    struct data_loc_info *dloc,
> +				    struct disasm_line *dl,
> +				    struct annotated_op_loc *src,
> +				    struct annotated_op_loc *dst)
> +{
> +	struct type_state_reg *tsr;
> +	u32 insn_offset = dl->al.offset;
> +	int sreg = src->reg1;
> +	int dreg = dst->reg1;
        ^^^^^^^^^^^^^^^^^^^^
[Severity: Medium]
Similar to the load instructions, does this logic ignore the second source
register (src->reg2) for ARM64 store pair instructions (stp)? The second
register of a store pair wouldn't be saved to the stack.

> +	int fbreg = dloc->fbreg;
> +	int fboff = 0;
> +
> +	if (!has_reg_type(state, sreg) || !state->regs[sreg].ok)
> +		goto out_adjust;
        ^^^^^^^^^^^^^^^^
[Severity: Medium]
By skipping the rest of the function here, are we bypassing the invalidation
of the stack state? If an untracked register is stored into a stack slot that
previously held a tracked type, the stale type might persist on the stack.

> +
> +	if (dloc->fb_cfa) {
> +		u64 ip = dloc->ms->sym->start + dl->al.offset;
> +		u64 pc = map__rip_2objdump(dloc->ms->map, ip);
> +
> +		if (die_get_cfa(dloc->di->dbg, pc, &fbreg, &fboff) < 0)
> +			fbreg = -1;
> +	}
> +
> +	/* Check stack variables with offset */
> +	if (dreg == fbreg || dreg == state->stack_reg) {
> +		struct type_state_stack *stack;
> +		int offset = dst->offset - fboff;
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Severity: Medium]
Could this calculation yield an incorrect internal stack offset by
subtracting the frame base offset even when the access is relative to the
stack pointer? This could happen if the frame base register is different from
the stack pointer.

> +
> +		tsr = &state->regs[sreg];
[ ... ]

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260701035355.752944-1-wutengda@huaweicloud.com?part=16

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 18/21] perf annotate-arm64: Support 'add' instruction tracking
  2026-07-01  3:53 ` [PATCH v3 18/21] perf annotate-arm64: Support 'add' " Tengda Wu
@ 2026-07-01  4:16   ` sashiko-bot
  2026-07-01  8:47     ` Tengda Wu
  0 siblings, 1 reply; 52+ messages in thread
From: sashiko-bot @ 2026-07-01  4:16 UTC (permalink / raw)
  To: Tengda Wu; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] Missing register invalidation when source registers are known non-pointer types, causing the destination register to incorrectly retain stale type information.
--

commit 5b1d520539dfc4b7d2f19958225cafb067fd8ce1
Author: Tengda Wu <wutengda@huaweicloud.com>

perf annotate-arm64: Support 'add' instruction tracking

Extend update_insn_state() for arm64 to track 'add' instructions for
structure member address calculation, which commonly appear as:

  add  dreg, base, #offset
  add  dreg, base, reg2     (reg2 holds a constant)

For such instructions, determine the final offset value from either the
immediate or the constant in reg2.

> diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
> index be07a85057c8c..1fed188117192 100644
> --- a/tools/perf/util/annotate-arch/annotate-arm64.c
> +++ b/tools/perf/util/annotate-arch/annotate-arm64.c
[ ... ]
> +static void update_add_insn_state(struct type_state *state,
> +				  struct disasm_line *dl,
> +				  struct annotated_op_loc *src,
> +				  struct annotated_op_loc *dst)
> +{
[ ... ]
> +	/* Handle calculation of a register holding a typed pointer */
> +	if (src_tsr.kind == TSR_KIND_POINTER ||
> +	    (src_tsr.kind == TSR_KIND_TYPE &&
> +	     dwarf_tag(&src_tsr.type) == DW_TAG_pointer_type)) {
[ ... ]
> +		pr_debug_type_name(&tsr->type, tsr->kind);
> +	}
> +	/* Or try another register if any */
> +	else if (src->multi_regs && src->reg1 != src->reg2 && sreg != src->reg2) {
> +		sreg = src->reg2;
> +		goto retry;
> +	}
> +}

[Severity: Medium]
When the source registers are not tracked pointers or valid types, the code
falls through the if/else if chain and returns implicitly here without calling
invalidate_reg_state(tsr).

Since update_insn_state_arm64() specifically exempts "add" instructions from
generic register invalidation (as seen below), won't dreg incorrectly retain
its previous stale type information?

[ ... ]
> @@ -589,7 +673,7 @@ static void update_insn_state_arm64(struct type_state *state,
>  	 * the destination register itself to prevent incorrect type propagation.
>  	 */
>  	if (has_reg_type(state, dst->reg1) &&
> -	    strcmp(dl->ins.name, "mov") &&
> +	    strcmp(dl->ins.name, "add") && strcmp(dl->ins.name, "mov") &&
>  	    strncmp(dl->ins.name, "ld", 2) && strncmp(dl->ins.name, "st", 2)) {
>  		pr_debug_dtp("%s [%x] invalidate reg%d\n",
>  			     dl->ins.name, insn_offset, dst->reg1);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260701035355.752944-1-wutengda@huaweicloud.com?part=18

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 20/21] perf annotate-arm64: Support per-cpu variable access tracking
  2026-07-01  3:53 ` [PATCH v3 20/21] perf annotate-arm64: Support per-cpu variable access tracking Tengda Wu
@ 2026-07-01  4:18   ` sashiko-bot
  0 siblings, 0 replies; 52+ messages in thread
From: sashiko-bot @ 2026-07-01  4:18 UTC (permalink / raw)
  To: Tengda Wu; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [Low] Missing explicit inclusion of <string.h> for POSIX string functions.
--

commit 2114d02145b83196afb640a1330f3e5c5b83b2b7
Author: Tengda Wu <wutengda@huaweicloud.com>

    perf annotate-arm64: Support per-cpu variable access tracking
    
    Extend update_insn_state() for arm64 to handle per-cpu variable
    addressing.

[ ... ]

> diff --git a/tools/perf/util/annotate-data.c b/tools/perf/util/annotate-data.c
> index 6fa5cd373a464..b0bf0437168bb 100644
> --- a/tools/perf/util/annotate-data.c
> +++ b/tools/perf/util/annotate-data.c

[Severity: Low]
This is a pre-existing issue, but this file appears to be missing an explicit
inclusion of <string.h> for POSIX string functions.

At the top of the file, <string.h> is not included alongside the other headers:

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>
#include <linux/zalloc.h>

However, the code uses strcmp() in data_type_cmp():

static int data_type_cmp(const void *_key, const struct rb_node *node)
{
...
	return strcmp(key->self.type_name, type->self.type_name);
}

Will this cause the build to fail when compiling under musl libc, which does
not implicitly include <string.h> through other system headers?

> @@ -1252,11 +1252,33 @@ static enum type_match_result check_matching_type(struct type_state *state,
>  	}
>  
>  	if (state->regs[reg].kind == TSR_KIND_PERCPU_BASE) {
> -		u64 var_addr = dloc->op->offset;
> +		u64 var_addr;

[ ... ]

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260701035355.752944-1-wutengda@huaweicloud.com?part=20

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 17/21] perf annotate-arm64: Support 'mov' instruction tracking
  2026-07-01  3:53 ` [PATCH v3 17/21] perf annotate-arm64: Support 'mov' instruction tracking Tengda Wu
@ 2026-07-01  4:21   ` sashiko-bot
  2026-07-01  8:46     ` Tengda Wu
  0 siblings, 1 reply; 52+ messages in thread
From: sashiko-bot @ 2026-07-01  4:21 UTC (permalink / raw)
  To: Tengda Wu; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] Type state corruption due to DWARF register overloading for the stack pointer (`sp`) and zero registers (`xzr`/`wzr`).
--

commit 8b77f0fe5f1ec7e9d14eee63f44f48d6c55f7234
Author: Tengda Wu <wutengda@huaweicloud.com>

perf annotate-arm64: Support 'mov' instruction tracking

This commit extends update_insn_state() for arm64 to track register-to-register
and immediate-to-register 'mov' instructions. It propagates data type
information or immediate values to the destination register to improve type
inference.

> diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
> --- a/tools/perf/util/annotate-arch/annotate-arm64.c
> +++ b/tools/perf/util/annotate-arch/annotate-arm64.c

[ ... ]

> +static void update_mov_insn_state(struct type_state *state,
> +				  struct disasm_line *dl,
> +				  struct annotated_op_loc *src,
> +				  struct annotated_op_loc *dst)
> +{

[ ... ]

> +	if (!has_reg_type(state, sreg) || !state->regs[sreg].ok) {
> +		invalidate_reg_state(tsr);
> +		return;
> +	}
> +
> +	tsr->type = state->regs[sreg].type;

[Severity: Medium]
Will this code corrupt the stack pointer's type state if the zero register
(xzr or wzr) is used in a mov instruction?

Looking at tools/perf/util/dwarf-regs-arch/dwarf-regs-arm64.c, the function
__get_dwarf_regnum_arm64() maps sp, wzr, and xzr all to DWARF register 31:

    if (!strcmp(name, "sp") || !strcmp(name, "wzr") || !strcmp(name, "xzr"))
        return 31;

If a zero register is used as a destination (e.g., mov xzr, x0), does its
tracked state overwrite the tracked state of the stack pointer?

Conversely, if a zero register is used as a source (e.g., mov x0, xzr),
does the destination register incorrectly inherit the type of the stack
pointer instead of being treated as a constant 0?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260701035355.752944-1-wutengda@huaweicloud.com?part=17

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 02/21] perf capstone: Fix arm64 jump/adrp disassembly mismatch with objdump
  2026-07-01  4:07   ` sashiko-bot
@ 2026-07-01  6:44     ` Tengda Wu
  0 siblings, 0 replies; 52+ messages in thread
From: Tengda Wu @ 2026-07-01  6:44 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: linux-perf-users



On 2026/7/1 12:07, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
> - [High] Symbol resolution fails for relocated kernels (KASLR) and user-space shared libraries due to mismatched address spaces.
> - [Low] Build failure due to undeclared `machine` variable and undefined `machine__normalized_is()` function in `capstone_init()`.
> - [Low] Missing `<stdlib.h>` inclusion causes musl libc build failure.
> - [Low] Usage of `ARM64_GRP_BRANCH_RELATIVE` causes build failures with Capstone versions prior to 5.0.
> --
> 
> commit 1f9744f04e456b1209677658edb7f30a5b26e56d
> Author: Tengda Wu <wutengda@huaweicloud.com>
> 
> perf capstone: Fix arm64 jump/adrp disassembly mismatch with objdump
> 
> The patch implements extended formatting for arm64 jump and adrp instructions
> during symbol__disassemble_capstone() to match objdump's expected output style.
> It adds instruction detail resolution in capstone_init() and formats the
> output with correct symbol and offset mappings.
> 
>> diff --git a/tools/perf/util/capstone.c b/tools/perf/util/capstone.c
>> index 9bba78ee0c5a2..a9c0a1d2eb036 100644
>> --- a/tools/perf/util/capstone.c
>> +++ b/tools/perf/util/capstone.c
> 
> [ ... ]
> 
>> @@ -218,6 +218,12 @@ static int capstone_init(uint16_t e_machine, csh *cs_handle, bool is64, bool is_
>>  		 * on x86 by investigating instruction details.
>>  		 */
>>  		perf_cs_option(*cs_handle, CS_OPT_DETAIL, CS_OPT_ON);
>> +	} else if (machine__normalized_is(machine, "arm64")) {
> 
> [Severity: Low]
> Does this code cause a build failure? The variable machine is not declared
> in capstone_init(), and the function machine__normalized_is() does not appear
> to be defined in the codebase.
> 

Agreed. This is due to commit 12c4737f55f2 ("perf capstone: Determine architecture from
e_machine"), which changed the capstone interface. I haven't rebased this part yet.
The base-commit already fails to build with capstone, so I didn't notice this one.

>> +		/*
>> +		 * Same as x86: arm64 needs instruction details to resolve
>> +		 * symbolic addresses
>> +		 */
>> +		perf_cs_option(*cs_handle, CS_OPT_DETAIL, CS_OPT_ON);
>>  	}
>>  
>>  	return 0;
> 
> [ ... ]
> 
>> +static bool is_pc_relative_insn(struct cs_insn *insn)
>> +{
>> +	int i;
>> +
>> +	if (insn->id == ARM64_INS_ADR || insn->id == ARM64_INS_ADRP)
>> +		return true;
>> +
>> +	if (insn->detail == NULL)
>> +		return false;
>> +
>> +	for (i = 0; i < insn->detail->groups_count; i++) {
>> +		if (insn->detail->groups[i] == ARM64_GRP_JUMP ||
>> +		    insn->detail->groups[i] == ARM64_GRP_CALL ||
>> +		    insn->detail->groups[i] == ARM64_GRP_BRANCH_RELATIVE)
> 
> [Severity: Low]
> Will this break compilation against Capstone versions prior to 5.0?
> ARM64_GRP_BRANCH_RELATIVE was introduced in Capstone 5.0. Since the perf
> tool's build feature detection doesn't mandate a specific Capstone version,
> relying on this enum might cause build failures on older setups.
> 

This enum was added in Capstone 5.0 over two years ago. I think we can
safely assume most systems are on a newer version now. I'll leave it as-is.

>> +			return true;
>> +	}
>> +
>> +	return false;
>> +}
>> +
>> +static void format_capstone_insn_arm64(struct cs_insn *insn, char *buf,
>> +				       size_t len, struct annotate_args *args,
>> +				       u64 addr)
>> +{
> 
> [ ... ]
> 
>> +	if (is_pc_relative_insn(insn)) {
>> +		/* Extract last immediate value as address */
>> +		last_imm = strrchr(buf, '#');
>> +		if (!last_imm)
>> +			return;
>> +
>> +		orig_addr = strtoull(last_imm + 1, &endptr, 16);
> 
> [Severity: Low]
> Can this cause a build failure on musl libc? The POSIX standard function
> strtoull() is used here, but stdlib.h is not explicitly included in the
> file. Relying on implicit inclusions often works on glibc but breaks on
> musl libc.
> 

I'd say we can ignore this. strtoull() is used all over the perf codebase
and gets included indirectly. I don't think this is a real issue.

>> +		if (endptr == last_imm + 1)
>> +			return;
>> +
>> +		/* Relocate map that contains the address */
>> +		if (dso__kernel(map__dso(map))) {
>> +			found_map = maps__find(map__kmaps(map), orig_addr);
>> +			if (found_map == NULL)
>> +				return;
>> +			map = found_map;
>> +		}
>> +
>> +		/* Convert it to map-relative address for search */
>> +		addr = map__map_ip(map, orig_addr);
> 
> [Severity: High]
> Does this code fail to resolve symbols for relocated kernels (KASLR) or
> user-space shared libraries (PIE)?
> 
> The extracted orig_addr is in the objdump address space. However,
> maps__find() and map__map_ip() expect absolute memory addresses. By omitting
> the translation step via map__objdump_2mem(map, orig_addr) (which the x86
> implementation handles correctly), KASLR relocated kernels will fail in
> maps__find(), and user-space shared libraries will underflow in
> map__map_ip().
> 

Agreed. I missed map__objdump_2mem() here.

>> +
>> +		sym = map__find_symbol(map, addr);
>> +		if (sym == NULL) {
>> +			map__put(found_map);
>> +			return;
>> +		}
> 


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 03/21] perf llvm: Fix arm64 adrp instruction disassembly mismatch with objdump
  2026-07-01  4:05   ` sashiko-bot
@ 2026-07-01  6:45     ` Tengda Wu
  0 siblings, 0 replies; 52+ messages in thread
From: Tengda Wu @ 2026-07-01  6:45 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: linux-perf-users



On 2026/7/1 12:05, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> - [Low] Missing explicit inclusion of <string.h> for POSIX string functions.
> --
> 
> commit c3571a268605b8e2ad28ebe5943729438021e5fe
> Author: Tengda Wu <wutengda@huaweicloud.com>
> 
> perf llvm: Fix arm64 adrp instruction disassembly mismatch with objdump
> 
> This commit addresses an inconsistency between libllvm and objdump when
> parsing arm64 'adrp' instructions. It extracts the target address during
> callbacks and formats it as an absolute address with symbol names in
> symbol__disassemble_llvm().
> 
>> diff --git a/tools/perf/util/llvm.c b/tools/perf/util/llvm.c
>> index a0deb742a7331..533d47e8084d7 100644
>> --- a/tools/perf/util/llvm.c
>> +++ b/tools/perf/util/llvm.c
> 
> [ ... ]
> 
>> @@ -227,6 +241,42 @@ int symbol__disassemble_llvm(const char *filename, struct symbol *sym,
>>  				free(name);
>>  			}
>>  		}
>> +		if (storage.pcrel_adrp_addr != 0) {
>> +			/*
>> +			 * ADRP (Address Page) instructions encode a 21-bit signed
>> +			 * immediate offset relative to the current PC's page.
>> +			 *
>> +			 * To maintain consistency with standard objdump output,
>> +			 * we truncate the raw encoded immediate at the comma
>> +			 * and replace it with the resolved absolute page address.
>> +			 *
>> +			 * Example conversion:
>> +			 * From: adrp x18, 8014
>> +			 * To:   adrp x18, ffff800081f5f000 <this_cpu_vector>
>> +			 */
>> +			char *name;
>> +			char *s = strchr(disasm_buf, ',');
> 
> [Severity: Low]
> Does this file need to explicitly include <string.h>?
> 
> This patch introduces a new use of strchr() here in
> symbol__disassemble_llvm(), but tools/perf/util/llvm.c lacks a direct
> inclusion of <string.h> at the top of the file.
> 
> While this might compile under glibc due to implicit header inclusion,
> missing explicit POSIX headers can cause build failures on systems
> using musl libc, which enforces strict namespace separation.
> 

Agreed.

>> +
>> +			if (s == NULL)
>> +				goto err;
>> +
>> +			s++;
>> +			*s = '\0';
>> +			disasm_len = strlen(disasm_buf);
> 
> [ ... ]
> 


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 04/21] perf annotate-arm64: Generalize arm64_mov__parse to support more instructions
  2026-07-01  4:03   ` sashiko-bot
@ 2026-07-01  6:57     ` Tengda Wu
  0 siblings, 0 replies; 52+ messages in thread
From: Tengda Wu @ 2026-07-01  6:57 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: linux-perf-users



On 2026/7/1 12:03, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> - [Medium] The parsing logic permanently corrupts the raw instruction string buffer (`ops->raw`) when restoring the `<` character after extracting a source symbol name.
> --
> 
> commit 13742d589ccf5a86dcfd0991f541a8f1d46f7e0f
> Author: Tengda Wu <wutengda@huaweicloud.com>
> 
> perf annotate-arm64: Generalize arm64_mov__parse to support more instructions
> 
> This commit generalizes arm64_mov__parse() to parse source and target operands
> for a wider variety of arm64 instructions. It utilizes arm64__check_multi_regs()
> to identify multiple registers while attempting to preserve parsing behavior for
> instructions containing the addr <symbol> suffix.
> 
>> diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
>> index 33080fdca125f..a1667237114a6 100644
>> --- a/tools/perf/util/annotate-arch/annotate-arm64.c
>> +++ b/tools/perf/util/annotate-arch/annotate-arm64.c
> 
> [ ... ]
> 
>> @@ -14,59 +16,129 @@
>>  static int arm64_mov__parse(const struct arch *arch,
>>  			    struct ins_operands *ops,
>>  			    struct map_symbol *ms __maybe_unused,
>>  			    struct disasm_line *dl __maybe_unused)
>>  {
> 
> [ ... ]
> 
>> +	/* Parse a symbol followed by an address, if present */
>>  	s = strchr(endptr, '<');
>>  	if (s == NULL)
>> -		goto out_free_target;
>> -	endptr = strchr(s + 1, '>');
>> +		return 0;
>> +	endptr = strrchr(s + 1, '>');
>>  	if (endptr == NULL)
>> -		goto out_free_target;
>> +		return 0;
>>  
>>  	*endptr = '\0';
>>  	*s = ' ';
>> -	ops->target.name = strdup(s);
>> +	s = skip_spaces(++s);
>> +	ops->source.name = strdup(s);
>>  	*s = '<';
>>  	*endptr = '>';
> 
> [Severity: Medium]
> Does this code overwrite the first character of the symbol name?
> 
> Because s is advanced by skip_spaces(++s), s now points to the first character
> of the symbol name. When *s = '<' is executed immediately after, it appears to
> corrupt the ops->raw instruction string buffer instead of restoring the '<'
> character at its original position. 
> 
> This could result in a visual glitch in the fallback raw instruction view
> where the first character of the symbol name is overwritten (for example,
> 0x1000 <sym> becomes 0x1000  <ym>).
> 

Agreed.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 05/21] perf annotate-arm64: Handle load and store instructions
  2026-07-01  4:07   ` sashiko-bot
@ 2026-07-01  7:03     ` Tengda Wu
  0 siblings, 0 replies; 52+ messages in thread
From: Tengda Wu @ 2026-07-01  7:03 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: linux-perf-users



On 2026/7/1 12:07, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
> - [Medium] arm64_ldst__parse fails to parse PC-relative load instructions, breaking operand extraction for literal pool loads.
> - [Medium] Failure to strip inline comments before parsing operands allows comments to corrupt `multi_regs` evaluation.
> --
> 
> commit 9939a67ebc615cf049eb33c68f83aa3646abb3f0
> Author: Tengda Wu <wutengda@huaweicloud.com>
> 
> perf annotate-arm64: Handle load and store instructions
> 
> This commit adds ldst_ops to parse data types and offsets for ARM64 memory
> access instructions using regex prefix matching.
> 
>> diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
>> index a1667237114a6..451500def8c12 100644
>> --- a/tools/perf/util/annotate-arch/annotate-arm64.c
>> +++ b/tools/perf/util/annotate-arch/annotate-arm64.c
> [ ... ]
>> @@ -141,6 +142,104 @@ static const struct ins_ops arm64_mov_ops = {
> [ ... ]
>> +static int arm64_ldst__parse(const struct arch *arch,
>> +			     struct ins_operands *ops,
>> +			     struct map_symbol *ms __maybe_unused,
>> +			     struct disasm_line *dl)
>> +{
>> +	char *s, *left, *right;
>> +
>> +	right = s = strchr(ops->raw, arch->objdump.memory_ref_char);
>> +	if (!s)
>> +		return -1;
> 
> [Severity: Medium]
> Will this fail to parse PC-relative load instructions?
> 
> When processing a PC-relative load instruction (e.g., ldr x0, 4005c0 <sym>),
> the 'ld' prefix matches the new ldst_insn regex in arch__new_arm64() and
> routes it to arm64_ldst__parse().
> 
> Since PC-relative loads lack a bracket ('['), strchr() will return NULL,
> causing the function to immediately return -1. Does this leave operands
> unpopulated and break annotation features like data type tracking for
> literal pool loads?
> 

Agreed. I overlooked this type of PC-relative load:
  ldr     x0, ffff800080f40c68 <__kvm_nvhe_$d>

>> +
>> +	while (s > ops->raw && *s != ',')
>> +		--s;
>> +
>> +	if (s == ops->raw)
>> +		return -1;
>> +
>> +	*s = '\0';
>> +	left = strdup(ops->raw);
>> +
>> +	*s = ',';
>> +	if (!left)
>> +		return -1;
>> +
>> +	right = strdup(right);
>> +	if (!right) {
>> +		zfree(&left);
>> +		return -1;
>> +	}
>> +
>> +	if (arm64__insn_is_source_first(dl)) {
>> +		ops->source.raw = left;
>> +		ops->source.mem_ref = false;
>> +
>> +		ops->target.raw = right;
>> +		ops->target.mem_ref = true;
>> +		ops->target.multi_regs = arm64__check_multi_regs(arch, ops->target.raw);
> 
> [Severity: Medium]
> Does this code incorrectly include inline comments when evaluating multi_regs?
> 
> The string duplicated into 'right' includes trailing comments from the raw
> operand string because they are not stripped prior to parsing.
> 
> If objdump outputs an inline comment containing a comma and a string
> resembling a register name (for example: // read, x2), could
> arm64__check_multi_regs() wrongly flag the operand as having multiple
> registers and corrupt data type profiling?
> 

Agreed.

>> +	} else {
>> +		ops->source.raw = right;
>> +		ops->source.mem_ref = true;
>> +		ops->source.multi_regs = arm64__check_multi_regs(arch, ops->source.raw);
>> +
>> +		ops->target.raw = left;
>> +		ops->target.mem_ref = false;
>> +	}
>> +
>> +	return 0;
>> +}
> 


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 06/21] perf dwarf-regs: Adapt get_dwarf_regnum() for arm64
  2026-07-01  4:07   ` sashiko-bot
@ 2026-07-01  7:14     ` Tengda Wu
  0 siblings, 0 replies; 52+ messages in thread
From: Tengda Wu @ 2026-07-01  7:14 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: linux-perf-users



On 2026/7/1 12:07, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
> - [Medium] The zero registers ('wzr' and 'xzr') are incorrectly mapped to DWARF register 31.
> - [Medium] The link register ('lr') alias is no longer recognized and will unexpectedly return -ENOENT.
> - [Low] The functions strcmp and strtol are used without explicitly including <string.h> and <stdlib.h>.
> --
> 
> commit cf10eca9678152514cc1354dc6997be3eec9f436
> Author: Tengda Wu <wutengda@huaweicloud.com>
> 
> perf dwarf-regs: Adapt get_dwarf_regnum() for arm64
> 
> This commit replaces the static string table lookup for arm64 DWARF registers
> with a programmatic implementation in __get_dwarf_regnum_arm64(). This allows
> resolving register names directly into their corresponding DWARF numbers
> during annotation.
> 
>> diff --git a/tools/perf/util/dwarf-regs-arch/dwarf-regs-arm64.c b/tools/perf/util/dwarf-regs-arch/dwarf-regs-arm64.c
>> index 593ca7d4fccca..be55fc2a4f385 100644
>> --- a/tools/perf/util/dwarf-regs-arch/dwarf-regs-arm64.c
>> +++ b/tools/perf/util/dwarf-regs-arch/dwarf-regs-arm64.c
>> @@ -1,5 +1,6 @@
>>  // SPDX-License-Identifier: GPL-2.0
>>  #include <errno.h>
>> +#include <ctype.h>
> 
> [Severity: Low]
> Does omitting the explicit includes for <string.h> and <stdlib.h> break the
> build on musl libc? 
> 
> This file now uses strcmp() and strtol() below, and musl strictly separates
> declarations without relying on implicit inclusions.
> 

Agreed. Will add them.

>>  #include <dwarf-regs.h>
>>  #include "../../../arch/arm64/include/uapi/asm/perf_regs.h"
> 
> [ ... ]
> 
>> +int __get_dwarf_regnum_arm64(const char *name)
>> +{
>> +	int reg;
>> +
>> +	if (!strcmp(name, "sp") || !strcmp(name, "wzr") || !strcmp(name, "xzr"))
>> +		return 31;
> 
> [Severity: Medium]
> Does this incorrectly map the zero registers (wzr and xzr) to DWARF
> register 31?
> 
> The AArch64 DWARF ABI assigns register 31 strictly to the stack pointer (sp),
> and zero registers have no valid DWARF mapping.
> 
> Mapping wzr and xzr to 31 might cause perf annotate to conflate operations on
> the zero register with stack pointer operations, leading to incorrect stack
> variable tracking and type profiling statistics.
> 

The current data type profiling implementation doesn't actually use these
register mappings. I'll simply remove them in the next revision to avoid
any confusion.

>> +
>> +	if (*name != 'x' && *name != 'w')
>> +		return -ENOENT;
> 
> [Severity: Medium]
> Will this strict prefix check reject the link register alias lr?
> 
> The previous static table lookup correctly mapped lr to register 30.
> Because lr does not start with x or w, it appears this will now return
> -ENOENT, which might break data type tracking for instructions that operate
> on the link register.
> 

I haven't observed `lr` in objdump disassembly so far. I'll leave it
as-is for now and revisit if we encounter real cases.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 08/21] perf annotate: Introduce extract_op_location callback for arch-specific parsing
  2026-07-01  4:06   ` sashiko-bot
@ 2026-07-01  7:29     ` Tengda Wu
  0 siblings, 0 replies; 52+ messages in thread
From: Tengda Wu @ 2026-07-01  7:29 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: linux-perf-users



On 2026/7/1 12:06, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
> - [Medium] Refactoring changes the error propagation behavior for memory operands with unrecognized registers, causing valid instructions to be incorrectly marked as bad.
> - [Low] Missing explicit inclusion of `<stdlib.h>` for the POSIX libc function `strtol()`, breaking musl compatibility.
> --
> 
> commit a79e03a1004b442317df26b7478d0a1daed1e563
> Author: Tengda Wu <wutengda@huaweicloud.com>
> 
> perf annotate: Introduce extract_op_location callback for arch-specific parsing
> 
> This patch refactors operand parsing by moving it from the central
> annotate_get_insn_location() into architecture-specific callbacks. By
> introducing extract_op_location callbacks for PowerPC and x86, it aims to
> simplify the core logic and improve maintainability without making functional
> changes.
> 
>> diff --git a/tools/perf/util/annotate-arch/annotate-x86.c b/tools/perf/util/annotate-arch/annotate-x86.c
>> index 7e61365363938..6a4b4bdcd7a7c 100644
>> --- a/tools/perf/util/annotate-arch/annotate-x86.c
>> +++ b/tools/perf/util/annotate-arch/annotate-x86.c
>> @@ -2,6 +2,7 @@
>>  #include <string.h>
>>  #include <linux/compiler.h>
>>  #include <linux/zalloc.h>
>> +#include <linux/ctype.h>
>>  #include <assert.h>
>>  #include <inttypes.h>
>>  #include "../annotate-data.h"
> 
> [Severity: Low]
> Does this file need an explicit include of <stdlib.h> for musl libc
> compatibility?
> 
> It appears strtol() is used later in extract_op_location_x86() without
> the required <stdlib.h> header being directly included here, which might
> lead to implicit function declaration errors on musl-based systems.
> 

Agreed.

> [ ... ]
> 
>> @@ -809,6 +810,86 @@ static void update_insn_state_x86(struct type_state *state,
>>  }
>>  #endif
>>  
>> +/*
>> + * Get register number and access offset from the given instruction.
>> + * It assumes AT&T x86 asm format like OFFSET(REG).  Maybe it needs
>> + * to revisit the format when it handles different architecture.
>> + * Fills @reg and @offset when return 0.
>> + */
>> +static int extract_reg_offset(const struct arch *arch, const char *str,
>> +			      struct annotated_op_loc *op_loc)
>> +{
> 
> [ ... ]
> 
>> +static int extract_op_location_x86(const struct arch *arch,
>> +				   struct disasm_line *dl __maybe_unused,
>> +				   const char *op_str, int op_idx __maybe_unused,
>> +				   struct annotated_op_loc *op_loc)
>> +{
>> +	const char *s = op_str;
>> +	char *p = NULL;
>> +
>> +	if (op_str == NULL)
>> +		return 0;
>> +
>> +	if (strchr(op_str, arch->objdump.memory_ref_char)) {
>> +		op_loc->mem_ref = true;
>> +		return extract_reg_offset(arch, op_str, op_loc);
> 
> [Severity: Medium]
> Does returning the result of extract_reg_offset() here alter the error
> propagation behavior?
> 
> In the previous annotate_get_insn_location() logic, the return value of
> extract_reg_offset() was safely ignored:
> 
>         } else if (strchr(insn_str, arch->objdump.memory_ref_char)) {
>                 op_loc->mem_ref = true;
>                 op_loc->multi_regs = multi_regs;
>                 extract_reg_offset(arch, insn_str, op_loc);
>         } else {
> 
> By returning it here, extract_op_location_x86() might return -1 if the
> register is unrecognized. This regression is now propagated back to the
> caller in annotate_get_insn_location():
> 
>         if (arch->extract_op_location) {
>                 ret = arch->extract_op_location(arch, dl, insn_str, i, op_loc);
>                 if (ret)
>                         return ret;
>         }
> 
> Could this cause valid instructions to be completely skipped and
> incorrectly marked as bad?
> 

I don't think this is a problem. If we don't return an error here,
extract_reg_offset() would continue with an uninitialized or invalid
offset, which would then cause update_insn_state() to make incorrect
judgments. Returning -1 and marking it as bad is actually safer than
silently propagating wrong information.

>> +	}
>> +
>> +	/* FIXME: Handle other segment registers */
>> +	if (!strncmp(op_str, "%gs:", 4)) {
>> +		op_loc->segment = INSN_SEG_X86_GS;
>> +		op_loc->offset = strtol(op_str + 4,
>> +					&p, 0);
> 


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 09/21] perf annotate-arm64: Implement extract_op_location() callback
  2026-07-01  4:10   ` sashiko-bot
@ 2026-07-01  7:36     ` Tengda Wu
  0 siblings, 0 replies; 52+ messages in thread
From: Tengda Wu @ 2026-07-01  7:36 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: linux-perf-users



On 2026/7/1 12:10, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> - [Medium] Post-index memory references with register offsets are incorrectly classified as SIGNED_OFFSET.
> --
> 
> commit 1b03950d2edd24d57bf3a7ad6271aea1980c92e3
> Author: Tengda Wu <wutengda@huaweicloud.com>
> 
> perf annotate-arm64: Implement extract_op_location() callback
> 
> Implement the extract_op_location() callback for the arm64 architecture
> to handle its specific assembly syntax and addressing modes, enabling data
> type profiling for this architecture.
> 
>> diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
>> index 451500def8c12..77b6f4841c849 100644
>> --- a/tools/perf/util/annotate-arch/annotate-arm64.c
>> +++ b/tools/perf/util/annotate-arch/annotate-arm64.c
> [ ... ]
>> +static int extract_op_location_arm64(const struct arch *arch,
>> +				     struct disasm_line *dl __maybe_unused,
>> +				     const char *op_str, int op_idx __maybe_unused,
>> +				     struct annotated_op_loc *op_loc)
>> +{
>> +	const char *s = op_str;
>> +	char *p = NULL;
>> +
>> +	if (op_str == NULL)
>> +		return 0;
> [ ... ]
>> +	/*
>> +	 * Handle memory references (e.g., [x0, #8]), identify
>> +	 * arm64 specific addressing modes
>> +	 */
>> +	if (*s == arch->objdump.memory_ref_char) {
>> +		op_loc->mem_ref = true;
>> +
>> +		p = strchr(s, ']');
>> +		if (p == NULL)
>> +			return -1;
>> +
>> +		/* Pre-index: [base, #imm]! */
>> +		if (p[1] == '!')
>> +			op_loc->addr_mode = PERF_ADDR_MODE_PRE_INDEX;
>> +		/* Post-index: [base], #imm */
>> +		else if (p[1] == ',' && strchr(p + 1, arch->objdump.imm_char))
>> +			op_loc->addr_mode = PERF_ADDR_MODE_POST_INDEX;
> 
> [Severity: Medium]
> Does this incorrectly restrict post-index addressing classification to only
> immediate offsets?
> 
> In ARM64, a comma immediately following the closing bracket (p[1] == ',') is
> sufficient to identify a post-index addressing mode, regardless of whether
> the offset is an immediate or a register (for example, Advanced SIMD
> instructions like ld1 {v0.16b}, [x2], x5).
> 
> If the condition fails for register offsets, it appears it will incorrectly
> fall through to PERF_ADDR_MODE_SIGNED_OFFSET. Could this cause the base
> register write-back to be silently ignored in adjust_reg_index_state(),
> leading to stale and corrupted type state tracking for the base register?
> 

I'd prefer to keep it as-is. SIMD instructions are not supported yet,
and the current restriction is safer. We can revisit if needed.

>> +		/* Signed offset: [base{, #imm}] */
>> +		else
>> +			op_loc->addr_mode = PERF_ADDR_MODE_SIGNED_OFFSET;
>> +
>> +		s++;
>> +	}
> 


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 13/21] perf annotate-arm64: Enable instruction tracking support
  2026-07-01  4:12   ` sashiko-bot
@ 2026-07-01  7:56     ` Tengda Wu
  0 siblings, 0 replies; 52+ messages in thread
From: Tengda Wu @ 2026-07-01  7:56 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: linux-perf-users



On 2026/7/1 12:12, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
> 
> New issues:
> - [High] Missing function call register invalidation for ARM64 data type tracking.
> - [Medium] Failure to invalidate secondary destination registers (`reg2`) in multi-register instructions.
> 
> Pre-existing issues:
> - [High] Missing `multi_regs` evaluation for register operands in load/store instructions.
> --
> 
> commit a1f858405ccf3b88519235d7e08cb42762d969ac
> Author: Tengda Wu <wutengda@huaweicloud.com>
> 
> perf annotate-arm64: Enable instruction tracking support
> 
> Enable basic instruction tracking for arm64 by implementing three essential
> functions in the find_data_type_block() call path: arch_supports_insn_tracking,
> init_type_state, and update_insn_state.
> 
>> diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
>> index 77b6f4841c849..48b1c94273eb6 100644
>> --- a/tools/perf/util/annotate-arch/annotate-arm64.c
>> +++ b/tools/perf/util/annotate-arch/annotate-arm64.c
> 
> [ ... ]
> 
>> @@ -324,6 +326,31 @@ static int extract_op_location_arm64(const struct arch *arch,
>>  	return 0;
>>  }
>>  
>> +#ifdef HAVE_LIBDW_SUPPORT
>> +static void update_insn_state_arm64(struct type_state *state,
>> +				    struct data_loc_info *dloc, Dwarf_Die *cu_die __maybe_unused,
>> +				    struct disasm_line *dl)
>> +{
>> +	struct annotated_insn_loc loc;
>> +	struct annotated_op_loc *dst = &loc.ops[INSN_OP_TARGET];
>> +	u32 insn_offset = dl->al.offset;
>> +
>> +	if (annotate_get_insn_location(dloc->arch, dl, &loc) < 0)
>> +		return;
>> +
> 
> [Severity: High]
> Does this function miss a check for function calls?
> 
> When an instruction is a function call (e.g., bl or blr), this code does not
> check ins__is_call(&dl->ins). As a result, it fails to invalidate the
> caller-saved registers (x0-x17) and update the return type in x0.
> 
> While init_type_state() correctly initializes the caller_saved flags for
> these registers, won't omitting the invalidation here cause subsequent
> accesses to inherit incorrect type states across function calls?
> 

Agreed.

>> +	/*
>> +	 * For unsupported instructions with a destination register, invalidate
>> +	 * the destination register itself to prevent incorrect type propagation.
>> +	 */
>> +	if (has_reg_type(state, dst->reg1)) {
>> +		pr_debug_dtp("%s [%x] invalidate reg%d\n",
>> +			     dl->ins.name, insn_offset, dst->reg1);
>> +		invalidate_reg_state(&state->regs[dst->reg1]);
>> +		return;
>> +	}
> 
> [Severity: Medium]
> Does this fallback logic need to handle secondary destination registers?
> 
> When an unsupported instruction modifies multiple registers (e.g., casp),
> this only checks and invalidates the primary destination register
> (dst->reg1). 
> 
> If the instruction has a secondary register (where dst->multi_regs is
> true), should we also invalidate dst->reg2 so it doesn't retain an
> incorrect type state?
> 

I'll leave it as-is. reg1 is enough for most cases, and multi-register
instructions are rare. We can revisit if needed.

> [Severity: High]
> This is a pre-existing issue, but I noticed a potential problem in
> arm64_ldst__parse() regarding multi-register instructions:
> 
> tools/perf/util/annotate-arch/annotate-arm64.c:arm64_ldst__parse() {
> ...
> 	if (arm64__insn_is_source_first(dl)) {
> 		ops->source.raw = left;
> 		ops->source.mem_ref = false;
> 		ops->target.raw = right;
> 		ops->target.mem_ref = true;
> 		ops->target.multi_regs = arm64__check_multi_regs(arch, ops->target.raw);
> 	}
> ...
> }
> 
> Here, arm64__check_multi_regs() is only applied to the memory operand side
> (ops->target in this case). The register operand side is left with
> multi_regs = false. 
> 
> Because of this, extract_op_location_arm64() fails to parse the second
> register (reg2) for instructions like ldp and stp, leaving it as -1. Won't
> this cause the data type tracking logic to silently miss type updates and
> queries for the secondary register?
> 

I'll leave it as-is. We only care about mem_ref cases. Multi-register
instructions without mem_ref are out of scope.

>> +}
>> +#endif
> 


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 14/21] perf annotate-arm64: Support load instruction tracking
  2026-07-01  4:14   ` sashiko-bot
@ 2026-07-01  8:37     ` Tengda Wu
  0 siblings, 0 replies; 52+ messages in thread
From: Tengda Wu @ 2026-07-01  8:37 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: linux-perf-users



On 2026/7/1 12:14, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
> - [Medium] Missing invalidation of the destination register on type resolution failure during load instruction tracking.
> - [Medium] The data type profiling logic ignores the second destination register of load pair (ldp) instructions, leading to stale state retention.
> --
> 
> commit 579f01582f34ef50d993c4cb489f025353f8dbcc
> Author: Tengda Wu <wutengda@huaweicloud.com>
> 
> perf annotate-arm64: Support load instruction tracking
> 
> This commit extends update_insn_state_arm64() to handle LDR instructions.
> It tracks register state changes when data is loaded from memory to registers,
> supporting signed offset, pre-index, and post-index addressing modes.
> 
>> diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
>> index 48b1c94273eb6..b48545f35620c 100644
>> --- a/tools/perf/util/annotate-arch/annotate-arm64.c
>> +++ b/tools/perf/util/annotate-arch/annotate-arm64.c
> 
> [ ... ]
> 
>> +static void update_load_insn_state(struct type_state *state,
>> +				   struct disasm_line *dl,
>> +				   struct annotated_op_loc *src,
>> +				   struct annotated_op_loc *dst)
>> +{
>> +	struct type_state_reg *tsr;
>> +	struct type_state_reg src_tsr;
>> +	Dwarf_Die type_die;
>> +	u32 insn_offset = dl->al.offset;
>> +	int reg_offset;
>> +	int sreg = src->reg1;
>> +	int dreg = dst->reg1;
> 
> [Severity: Medium]
> Does this ignore the second destination register for load pair (ldp)
> instructions?
> 
> If the instruction writes to two destination registers (e.g.,
> ldp x0, x1, [x2]), it seems dst->reg2 is not checked or invalidated here,
> which could leave it with stale type information. Does the parsing logic in
> arm64_ldst__parse() also need to set ops->target.multi_regs for this to be
> properly tracked?
> 
>> +
>> +	if (!has_reg_type(state, dreg))
>> +		goto out_adjust;
>> +
>> +	tsr = &state->regs[dreg];
>> +	tsr->copied_from = -1;
>> +
>> +retry:
>> +	if (!has_reg_type(state, sreg) || !state->regs[sreg].ok) {
>> +		invalidate_reg_state(tsr);
>> +		return;
>> +	}
>> +
>> +	src_tsr = state->regs[sreg];
>> +	reg_offset = get_reg_index_offset(src);
>> +
>> +	/* Dereference the pointer if it has one */
>> +	if (src_tsr.kind == TSR_KIND_TYPE &&
>> +	    die_deref_ptr_type(&src_tsr.type,
>> +			       src_tsr.offset + reg_offset, &type_die)) {
>> +		tsr->type = type_die;
>> +		tsr->kind = TSR_KIND_TYPE;
>> +		tsr->offset = 0;
>> +		tsr->ok = true;
>> +
>> +		if (src->multi_regs) {
>> +			pr_debug_dtp("ldr [%x] %#x(reg%d, reg%d) -> reg%d",
>> +				     insn_offset, reg_offset, src->reg1,
>> +				     src->reg2, dreg);
>> +		} else {
>> +			pr_debug_dtp("ldr [%x] %#x(reg%d) -> reg%d",
>> +				     insn_offset, reg_offset, sreg, dreg);
>> +		}
>> +		pr_debug_type_name(&tsr->type, tsr->kind);
>> +	}
>> +	/* Or try another register if any */
>> +	else if (src->multi_regs && src->reg1 != src->reg2 && sreg != src->reg2) {
>> +		sreg = src->reg2;
>> +		goto retry;
>> +	}
> 
> [Severity: Medium]
> If the type resolution fails and execution bypasses both of these if/else
> blocks, it falls through to out_adjust.
> 
> Should the destination register be invalidated here?
> 
> It looks like tsr->ok could remain true from its previous state, leaving it
> with incorrect type information.
> 

Agreed.

>> +
>> +out_adjust:
>> +	adjust_reg_index_state(state, src, "ldr", insn_offset);
>> +}
> 


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 17/21] perf annotate-arm64: Support 'mov' instruction tracking
  2026-07-01  4:21   ` sashiko-bot
@ 2026-07-01  8:46     ` Tengda Wu
  0 siblings, 0 replies; 52+ messages in thread
From: Tengda Wu @ 2026-07-01  8:46 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: linux-perf-users



On 2026/7/1 12:21, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> - [Medium] Type state corruption due to DWARF register overloading for the stack pointer (`sp`) and zero registers (`xzr`/`wzr`).
> --
> 
> commit 8b77f0fe5f1ec7e9d14eee63f44f48d6c55f7234
> Author: Tengda Wu <wutengda@huaweicloud.com>
> 
> perf annotate-arm64: Support 'mov' instruction tracking
> 
> This commit extends update_insn_state() for arm64 to track register-to-register
> and immediate-to-register 'mov' instructions. It propagates data type
> information or immediate values to the destination register to improve type
> inference.
> 
>> diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
>> --- a/tools/perf/util/annotate-arch/annotate-arm64.c
>> +++ b/tools/perf/util/annotate-arch/annotate-arm64.c
> 
> [ ... ]
> 
>> +static void update_mov_insn_state(struct type_state *state,
>> +				  struct disasm_line *dl,
>> +				  struct annotated_op_loc *src,
>> +				  struct annotated_op_loc *dst)
>> +{
> 
> [ ... ]
> 
>> +	if (!has_reg_type(state, sreg) || !state->regs[sreg].ok) {
>> +		invalidate_reg_state(tsr);
>> +		return;
>> +	}
>> +
>> +	tsr->type = state->regs[sreg].type;
> 
> [Severity: Medium]
> Will this code corrupt the stack pointer's type state if the zero register
> (xzr or wzr) is used in a mov instruction?
> 
> Looking at tools/perf/util/dwarf-regs-arch/dwarf-regs-arm64.c, the function
> __get_dwarf_regnum_arm64() maps sp, wzr, and xzr all to DWARF register 31:
> 
>     if (!strcmp(name, "sp") || !strcmp(name, "wzr") || !strcmp(name, "xzr"))
>         return 31;
> 
> If a zero register is used as a destination (e.g., mov xzr, x0), does its
> tracked state overwrite the tracked state of the stack pointer?
> 
> Conversely, if a zero register is used as a source (e.g., mov x0, xzr),
> does the destination register incorrectly inherit the type of the stack
> pointer instead of being treated as a constant 0?
> 

I'll drop wzr/xzr register support to avoid this confusion.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 18/21] perf annotate-arm64: Support 'add' instruction tracking
  2026-07-01  4:16   ` sashiko-bot
@ 2026-07-01  8:47     ` Tengda Wu
  0 siblings, 0 replies; 52+ messages in thread
From: Tengda Wu @ 2026-07-01  8:47 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: linux-perf-users



On 2026/7/1 12:16, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> - [Medium] Missing register invalidation when source registers are known non-pointer types, causing the destination register to incorrectly retain stale type information.
> --
> 
> commit 5b1d520539dfc4b7d2f19958225cafb067fd8ce1
> Author: Tengda Wu <wutengda@huaweicloud.com>
> 
> perf annotate-arm64: Support 'add' instruction tracking
> 
> Extend update_insn_state() for arm64 to track 'add' instructions for
> structure member address calculation, which commonly appear as:
> 
>   add  dreg, base, #offset
>   add  dreg, base, reg2     (reg2 holds a constant)
> 
> For such instructions, determine the final offset value from either the
> immediate or the constant in reg2.
> 
>> diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
>> index be07a85057c8c..1fed188117192 100644
>> --- a/tools/perf/util/annotate-arch/annotate-arm64.c
>> +++ b/tools/perf/util/annotate-arch/annotate-arm64.c
> [ ... ]
>> +static void update_add_insn_state(struct type_state *state,
>> +				  struct disasm_line *dl,
>> +				  struct annotated_op_loc *src,
>> +				  struct annotated_op_loc *dst)
>> +{
> [ ... ]
>> +	/* Handle calculation of a register holding a typed pointer */
>> +	if (src_tsr.kind == TSR_KIND_POINTER ||
>> +	    (src_tsr.kind == TSR_KIND_TYPE &&
>> +	     dwarf_tag(&src_tsr.type) == DW_TAG_pointer_type)) {
> [ ... ]
>> +		pr_debug_type_name(&tsr->type, tsr->kind);
>> +	}
>> +	/* Or try another register if any */
>> +	else if (src->multi_regs && src->reg1 != src->reg2 && sreg != src->reg2) {
>> +		sreg = src->reg2;
>> +		goto retry;
>> +	}
>> +}
> 
> [Severity: Medium]
> When the source registers are not tracked pointers or valid types, the code
> falls through the if/else if chain and returns implicitly here without calling
> invalidate_reg_state(tsr).
> 
> Since update_insn_state_arm64() specifically exempts "add" instructions from
> generic register invalidation (as seen below), won't dreg incorrectly retain
> its previous stale type information?
> 

Already covered in the earlier discussion. Will fix all in one go.

> [ ... ]
>> @@ -589,7 +673,7 @@ static void update_insn_state_arm64(struct type_state *state,
>>  	 * the destination register itself to prevent incorrect type propagation.
>>  	 */
>>  	if (has_reg_type(state, dst->reg1) &&
>> -	    strcmp(dl->ins.name, "mov") &&
>> +	    strcmp(dl->ins.name, "add") && strcmp(dl->ins.name, "mov") &&
>>  	    strncmp(dl->ins.name, "ld", 2) && strncmp(dl->ins.name, "st", 2)) {
>>  		pr_debug_dtp("%s [%x] invalidate reg%d\n",
>>  			     dl->ins.name, insn_offset, dst->reg1);
> 



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 19/21] perf annotate-arm64: Support 'adrp' instruction to track global variables
  2026-07-01  4:15   ` sashiko-bot
@ 2026-07-01  8:48     ` Tengda Wu
  0 siblings, 0 replies; 52+ messages in thread
From: Tengda Wu @ 2026-07-01  8:48 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: linux-perf-users



On 2026/7/1 12:15, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> - [Medium] Stale `type` field in register state causes `TSR_KIND_GLOBAL_ADDR` to be corrupted across stack spills.
> --
> 
> commit ddfb698eef878cf476210d72e082d8b79c811ecd
> Author: Tengda Wu <wutengda@huaweicloud.com>
> 
> perf annotate-arm64: Support 'adrp' instruction to track global variables
> 
> This patch extends update_insn_state() for arm64 to track global variable
> types calculated via page-relative addressing. It introduces
> TSR_KIND_GLOBAL_ADDR to represent the partial page-aligned target address
> produced by adrp instructions. The address is later combined with the offset
> from a subsequent add or ldr instruction to resolve the variable's type.
> 
>> diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
>> index 1fed188117192..6f96e75d313dd 100644
>> --- a/tools/perf/util/annotate-arch/annotate-arm64.c
>> +++ b/tools/perf/util/annotate-arch/annotate-arm64.c
> [ ... ]
>> @@ -656,8 +700,37 @@ static void update_add_insn_state(struct type_state *state,
>>  	}
>>  }
>>  
>> +static void update_adrp_insn_state(struct type_state *state,
>> +				   struct disasm_line *dl,
>> +				   struct annotated_op_loc *dst)
>> +{
>> +	struct type_state_reg *tsr;
>> +	u32 insn_offset = dl->al.offset;
>> +	int dreg = dst->reg1;
>> +
>> +	if (!has_reg_type(state, dreg))
>> +		return;
>> +
>> +	tsr = &state->regs[dreg];
>> +	tsr->copied_from = -1;
>> +
>> +	if (!dl->ops.source.addr) {
>> +		invalidate_reg_state(tsr);
>> +		return;
>> +	}
>> +
>> +	tsr->kind = TSR_KIND_GLOBAL_ADDR;
>> +	/* Partial page-relative address, finalized in next 'add/ldr' */
>> +	tsr->addr = dl->ops.source.addr;
>> +	tsr->offset = 0;
>> +	tsr->ok = true;
> 
> [Severity: Medium]
> Does this leave tsr->type containing a stale Dwarf_Die?
> 
> If the register previously held a structure or union type, the type field
> isn't cleared here when switching to TSR_KIND_GLOBAL_ADDR.
> 
> If this register is later spilled to the stack, set_stack_state() will
> evaluate the stale type_die. Since the type_die tag is DW_TAG_structure_type,
> it will erroneously set stack->compound = true.
> 
> When the register is loaded back via ldr, update_load_insn_state() will then
> branch into the compound type handling (die_get_member_type()) instead of
> restoring the TSR_KIND_GLOBAL_ADDR state. 
> 
> Could this destroy the tracking state and cause perf annotate to fail resolving
> the global variable's type?
> 

Agreed. Need to clear the type explicitly.

>> +
>> +	pr_debug_dtp("adrp [%x] global addr=%#"PRIx64" -> reg%d\n",
>> +		     insn_offset, tsr->addr, dreg);
>> +}
> 


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 21/21] perf annotate-arm64: Support 'mrs' instruction to track 'current' pointer
  2026-07-01  4:16   ` sashiko-bot
@ 2026-07-01  8:56     ` Tengda Wu
  0 siblings, 0 replies; 52+ messages in thread
From: Tengda Wu @ 2026-07-01  8:56 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: linux-perf-users



On 2026/7/1 12:16, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
> - [Medium] Memory leak of the dynamically allocated `type_name` string obtained from `strbuf_detach()`.
> - [Medium] Cross-DSO contamination and potential type confusion due to globally caching a DWARF offset (`task_struct_dieoff`).
> --
> 
> commit 0f7d6f3c161448b3cb967b47ee3887be3d974dc1
> Author: Tengda Wu <wutengda@huaweicloud.com>
> 
> perf annotate-arm64: Support 'mrs' instruction to track 'current' pointer
> 
> This commit adds support for parsing the 'mrs' instruction on arm64 in
> 'perf annotate' to track the 'current' task pointer. It dynamically queries
> the DWARF debug information for the 'get_current()' inline function's
> return type to find 'struct task_struct', caching the DWARF offset for
> subsequent lookups.
> 
>> diff --git a/tools/perf/util/annotate-arch/annotate-arm64.c b/tools/perf/util/annotate-arch/annotate-arm64.c
>> index ec6fd59d51a27..89e3997f6721b 100644
>> --- a/tools/perf/util/annotate-arch/annotate-arm64.c
>> +++ b/tools/perf/util/annotate-arch/annotate-arm64.c
> [ ... ]
>> @@ -787,6 +789,81 @@ static void update_adrp_insn_state(struct type_state *state,
>>  		     insn_offset, tsr->addr, dreg);
>>  }
>>  
>> +static Dwarf_Off task_struct_dieoff;
>> +
>> +static void update_mrs_insn_state(struct type_state *state,
>> +				  struct data_loc_info *dloc, Dwarf_Die *cu_die,
>> +				  struct disasm_line *dl,
>> +				  struct annotated_op_loc *dst)
>> +{
> [ ... ]
>> +	/* Handle case difference: LLVM (SP_EL0) vs objdump (sp_el0) */
>> +	if (!dso__kernel(map__dso(dloc->ms->map)) ||
>> +	    strcasecmp(dl->ops.source.raw, "sp_el0")) {
>> +		invalidate_reg_state(tsr);
>> +		return;
>> +	}
>> +
>> +	ip = dloc->ms->sym->start + dl->al.offset;
>> +	pc = map__rip_2objdump(dloc->ms->map, ip);
>> +
>> +	if (!task_struct_dieoff ||
>> +	    !dwarf_offdie(dloc->di->dbg, task_struct_dieoff, &type_die)) {
> 
> [Severity: Medium]
> Since DWARF offsets are local to their .debug_info section, if perf annotate
> processes a symbol in a different kernel module (where dso__kernel() is still
> true), couldn't this global task_struct_dieoff be applied to the module's
> Dwarf * handle?
> 
> Would this result in type confusion or returning an arbitrary, unrelated DIE
> across different DSOs?
> 
>> +		struct strbuf sb;
>> +		char *type_name;
>> +		/*
>> +		 * Find the inline function 'get_current()' Dwarf_Die
>> +		 * and obtain its return value data type, which should
>> +		 * be 'struct task_struct*'.
>> +		 */
>> +		if (!die_find_inlinefunc(cu_die, pc, &func_die) ||
>> +			!dwarf_attr_integrate(&func_die, DW_AT_type, &attr) ||
>> +			!dwarf_formref_die(&attr, &type_die)) {
>> +			invalidate_reg_state(tsr);
>> +			return;
>> +		}
>> +
>> +		strbuf_init(&sb, 32);
>> +		die_get_typename_from_type(&type_die, &sb);
>> +		type_name = strbuf_detach(&sb, NULL);
>> +
>> +		if (!type_name || strcmp(type_name, "struct task_struct*")) {
>> +			invalidate_reg_state(tsr);
>> +			return;
>> +		}
> 
> [Severity: Medium]
> Does this code leak the type_name string?
> 
> strbuf_detach() allocates and transfers ownership of the string to type_name,
> but it appears to never be freed on either the success path or the error path
> returning here.
> 

Agreed. Will fix it in next version.

>> +
>> +		/*
>> +		 * Cache the 'struct task_struct*' die offset globally.
>> +		 * This allows us to resolve stack canary accesses even
>> +		 * in CUs that lack a full task_struct definition (e.g.,
>> +		 * compiler-generated entry/exit code).
>> +		 */
>> +		task_struct_dieoff = dwarf_dieoffset(&type_die);
>> +	}
> 


^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2026-07-01  8:56 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-07-01  3:53 [PATCH v3 00/21] perf arm64: Support data type profiling Tengda Wu
2026-07-01  3:53 ` [PATCH v3 01/21] perf capstone: Fix kernel map reference count leak Tengda Wu
2026-07-01  3:53 ` [PATCH v3 02/21] perf capstone: Fix arm64 jump/adrp disassembly mismatch with objdump Tengda Wu
2026-07-01  4:07   ` sashiko-bot
2026-07-01  6:44     ` Tengda Wu
2026-07-01  3:53 ` [PATCH v3 03/21] perf llvm: Fix arm64 adrp instruction " Tengda Wu
2026-07-01  4:05   ` sashiko-bot
2026-07-01  6:45     ` Tengda Wu
2026-07-01  3:53 ` [PATCH v3 04/21] perf annotate-arm64: Generalize arm64_mov__parse to support more instructions Tengda Wu
2026-07-01  4:03   ` sashiko-bot
2026-07-01  6:57     ` Tengda Wu
2026-07-01  3:53 ` [PATCH v3 05/21] perf annotate-arm64: Handle load and store instructions Tengda Wu
2026-07-01  4:07   ` sashiko-bot
2026-07-01  7:03     ` Tengda Wu
2026-07-01  3:53 ` [PATCH v3 06/21] perf dwarf-regs: Adapt get_dwarf_regnum() for arm64 Tengda Wu
2026-07-01  4:07   ` sashiko-bot
2026-07-01  7:14     ` Tengda Wu
2026-07-01  3:53 ` [PATCH v3 07/21] perf annotate: Adapt arch__dwarf_regnum() " Tengda Wu
2026-07-01  3:53 ` [PATCH v3 08/21] perf annotate: Introduce extract_op_location callback for arch-specific parsing Tengda Wu
2026-07-01  4:06   ` sashiko-bot
2026-07-01  7:29     ` Tengda Wu
2026-07-01  3:53 ` [PATCH v3 09/21] perf annotate-arm64: Implement extract_op_location() callback Tengda Wu
2026-07-01  4:10   ` sashiko-bot
2026-07-01  7:36     ` Tengda Wu
2026-07-01  3:53 ` [PATCH v3 10/21] perf annotate: Deduplicate overlapping ARM SPE events for data type profiling Tengda Wu
2026-07-01  4:06   ` sashiko-bot
2026-07-01  3:53 ` [PATCH v3 11/21] perf auxtrace: Set default period to 1 for PERF_ITRACE_PERIOD_INSTRUCTIONS type Tengda Wu
2026-07-01  4:05   ` sashiko-bot
2026-07-01  3:53 ` [PATCH v3 12/21] perf annotate-data: Extract invalidate_reg_state() as a common helper Tengda Wu
2026-07-01  3:53 ` [PATCH v3 13/21] perf annotate-arm64: Enable instruction tracking support Tengda Wu
2026-07-01  4:12   ` sashiko-bot
2026-07-01  7:56     ` Tengda Wu
2026-07-01  3:53 ` [PATCH v3 14/21] perf annotate-arm64: Support load instruction tracking Tengda Wu
2026-07-01  4:14   ` sashiko-bot
2026-07-01  8:37     ` Tengda Wu
2026-07-01  3:53 ` [PATCH v3 15/21] perf annotate-arm64: Support store " Tengda Wu
2026-07-01  3:53 ` [PATCH v3 16/21] perf annotate-arm64: Support stack variable tracking Tengda Wu
2026-07-01  4:16   ` sashiko-bot
2026-07-01  3:53 ` [PATCH v3 17/21] perf annotate-arm64: Support 'mov' instruction tracking Tengda Wu
2026-07-01  4:21   ` sashiko-bot
2026-07-01  8:46     ` Tengda Wu
2026-07-01  3:53 ` [PATCH v3 18/21] perf annotate-arm64: Support 'add' " Tengda Wu
2026-07-01  4:16   ` sashiko-bot
2026-07-01  8:47     ` Tengda Wu
2026-07-01  3:53 ` [PATCH v3 19/21] perf annotate-arm64: Support 'adrp' instruction to track global variables Tengda Wu
2026-07-01  4:15   ` sashiko-bot
2026-07-01  8:48     ` Tengda Wu
2026-07-01  3:53 ` [PATCH v3 20/21] perf annotate-arm64: Support per-cpu variable access tracking Tengda Wu
2026-07-01  4:18   ` sashiko-bot
2026-07-01  3:53 ` [PATCH v3 21/21] perf annotate-arm64: Support 'mrs' instruction to track 'current' pointer Tengda Wu
2026-07-01  4:16   ` sashiko-bot
2026-07-01  8:56     ` Tengda Wu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox