From: James Clark <james.clark@linaro.org>
To: Tengda Wu <wutengda@huaweicloud.com>
Cc: Bill Wendling <morbo@google.com>,
Nick Desaulniers <nick.desaulniers+lkml@gmail.com>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Adrian Hunter <adrian.hunter@intel.com>,
Zecheng Li <zli94@ncsu.edu>,
linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org,
llvm@lists.linux.dev, Peter Zijlstra <peterz@infradead.org>,
Namhyung Kim <namhyung@kernel.org>,
leo.yan@linux.dev, Li Huafei <lihuafei1@huawei.com>,
Ian Rogers <irogers@google.com>,
Kim Phillips <kim.phillips@arm.com>,
Mark Rutland <mark.rutland@arm.com>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Ingo Molnar <mingo@redhat.com>
Subject: Re: [PATCH v2 00/16] perf arm64: Support data type profiling
Date: Thu, 16 Apr 2026 16:31:56 +0100 [thread overview]
Message-ID: <55fde39f-36b4-44b4-bb9f-4179c6d2d066@linaro.org> (raw)
In-Reply-To: <20260403094800.1418825-1-wutengda@huaweicloud.com>
On 03/04/2026 10:47, Tengda Wu wrote:
> This patch series implements data type profiling support for arm64,
> building upon the foundational work previously contributed by Huafei [1].
> While the initial version laid the groundwork for arm64 data type analysis,
> this series iterates on that work by refining instruction parsing and
> extending support for core architectural features.
>
> The series is organized as follows:
>
> 1. Fix disassembly mismatches (Patches 01-02)
> Current perf annotate supports three disassembly backends: llvm,
> capstone, and objdump. On arm64, inconsistencies between the output
> of these backends (specifically llvm/capstone vs. objdump) often
> prevent the tracker from correctly identifying registers and offsets.
> These patches resolve these mismatches, ensuring consistent instruction
> parsing across all supported backends.
>
> 2. Infrastructure for arm64 operand parsing (Patches 03-07)
> These patches establish the necessary infrastructure for arm64-specific
> operand handling. This includes implementing new callbacks and data
> structures to manage arm64's unique addressing modes and register sets.
> This foundation is essential for the subsequent type-tracking logic.
>
> 3. Core instruction tracking (Patches 08-16)
> These patches implement the core logic for type tracking on arm64,
> covering a wide range of instructions including:
>
> * Memory Access: ldr/str variants (including stack-based access).
> * Arithmetic & Data Processing: mov, add, and adrp.
> * Special Access: System register access (mrs) and per-cpu variable
> tracking.
>
> The implementation draws inspiration from the existing x86 logic while
> adapting it to the nuances of the AArch64 ISA [2][3]. With these changes,
> perf annotate can successfully resolve memory locations and register
> types, enabling comprehensive data type profiling on arm64 platforms.
>
> Example Result
> ==============
>
> # perf mem record -a -K -- sleep 1
> # perf annotate --data-type --type-stat --stdio
Hi Tengda,
Did you run this with any itrace options? If I run your command I get
repeated blocks of duplicate stats and types, which is very confusing.
One for each sample type that we generate decoding SPE.
For example the default perf report output has all these groups:
Available samples
0 arm_spe_0/
ts_enable=1,pa_enable=1,load_filter=1,store_filter=1,min_latency=30/
0 dummy:u
3 l1d-miss
18 l1d-access
0 llc-miss
0 llc-access
0 tlb-miss
22 tlb-access
0 branch
0 remote-access
22 memory
22 instructions
Obviously there are 22 samples total (instructions) and they get
duplicated into whatever other categories they happen to have flags for.
To remove the duplicates you have to do --itrace=i1i. Could that need to
be default for perf annotate with SPE?
> Annotate data type stats:
> total 6204, ok 5091 (82.1%), bad 1113 (17.9%)
> -----------------------------------------------------------
> 29 : no_sym
> 196 : no_var
> 806 : no_typeinfo
> 82 : bad_offset
> 1370 : insn_track
>
> Annotate type: 'struct page' in [kernel.kallsyms] (59208 samples):
> ============================================================================
> Percent offset size field
> 100.00 0 0x40 struct page {
> 9.95 0 0x8 long unsigned int flags;
> 52.83 0x8 0x28 union {
> 52.83 0x8 0x28 struct {
> 37.21 0x8 0x10 union {
> 37.21 0x8 0x10 struct list_head lru {
> 37.21 0x8 0x8 struct list_head* next;
> 0.00 0x10 0x8 struct list_head* prev;
> };
> 37.21 0x8 0x10 struct {
> 37.21 0x8 0x8 void* __filler;
> 0.00 0x10 0x4 unsigned int mlock_count;
> ...
>
> Changes since v1: (reworked from Huafei's series):
>
> - Fix inconsistencies in arm64 instruction output across llvm, capstone,
> and objdump disassembly backends.
> - Support arm64-specific addressing modes and operand formats. (Leo Yan)
> - Extend instruction tracking to support mov and add instructions,
> along with per-cpu and stack variables.
> - Include real-world examples in commit messages to demonstrate
> practical effects. (Namhyung Kim)
> - Improve type-tracking success rate (type stat) from 64.2% to 82.1%.
> https://lore.kernel.org/all/20250314162137.528204-1-lihuafei1@huawei.com/
>
> Please let me know if you have any feedback.
>
> Thanks,
> Tengda
>
> [1] https://lore.kernel.org/all/20250314162137.528204-1-lihuafei1@huawei.com/
> [2] https://developer.arm.com/documentation/102374/0103
> [3] https://github.com/flynd/asmsheets/releases/tag/v8
>
> ---
>
> Tengda Wu (16):
> perf llvm: Fix arm64 adrp instruction disassembly mismatch with
> objdump
> perf capstone: Fix arm64 jump/adrp disassembly mismatch with objdump
> perf annotate-arm64: Generalize arm64_mov__parse to support standard
> operands
> perf annotate-arm64: Handle load and store instructions
> perf annotate: Introduce extract_op_location callback for
> arch-specific parsing
> perf dwarf-regs: Adapt get_dwarf_regnum() for arm64
> perf annotate-arm64: Implement extract_op_location() callback
> perf annotate-arm64: Enable instruction tracking support
> perf annotate-arm64: Support load instruction tracking
> perf annotate-arm64: Support store instruction tracking
> perf annotate-arm64: Support stack variable tracking
> perf annotate-arm64: Support 'mov' instruction tracking
> perf annotate-arm64: Support 'add' instruction tracking
> perf annotate-arm64: Support 'adrp' instruction to track global
> variables
> perf annotate-arm64: Support per-cpu variable access tracking
> perf annotate-arm64: Support 'mrs' instruction to track 'current'
> pointer
>
> .../perf/util/annotate-arch/annotate-arm64.c | 642 +++++++++++++++++-
> .../util/annotate-arch/annotate-powerpc.c | 10 +
> tools/perf/util/annotate-arch/annotate-x86.c | 88 ++-
> tools/perf/util/annotate-data.c | 72 +-
> tools/perf/util/annotate-data.h | 7 +-
> tools/perf/util/annotate.c | 108 +--
> tools/perf/util/annotate.h | 12 +
> tools/perf/util/capstone.c | 107 ++-
> tools/perf/util/disasm.c | 5 +
> tools/perf/util/disasm.h | 5 +
> .../util/dwarf-regs-arch/dwarf-regs-arm64.c | 20 +
> tools/perf/util/dwarf-regs.c | 2 +-
> tools/perf/util/include/dwarf-regs.h | 1 +
> tools/perf/util/llvm.c | 50 ++
> 14 files changed, 984 insertions(+), 145 deletions(-)
>
>
> base-commit: cf7c3c02fdd0dfccf4d6611714273dcb538af2cb
next prev parent reply other threads:[~2026-04-16 15:31 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-03 9:47 [PATCH v2 00/16] perf arm64: Support data type profiling Tengda Wu
2026-04-03 9:47 ` [PATCH v2 01/16] perf llvm: Fix arm64 adrp instruction disassembly mismatch with objdump Tengda Wu
2026-04-03 9:47 ` [PATCH v2 02/16] perf capstone: Fix arm64 jump/adrp " Tengda Wu
2026-04-07 6:43 ` Namhyung Kim
2026-04-10 9:08 ` Tengda Wu
2026-04-14 13:51 ` James Clark
2026-04-03 9:47 ` [PATCH v2 03/16] perf annotate-arm64: Generalize arm64_mov__parse to support standard operands Tengda Wu
2026-04-07 6:58 ` Namhyung Kim
2026-04-10 10:06 ` Tengda Wu
2026-04-14 14:13 ` James Clark
2026-04-03 9:47 ` [PATCH v2 04/16] perf annotate-arm64: Handle load and store instructions Tengda Wu
2026-04-07 7:09 ` Namhyung Kim
2026-04-10 10:16 ` Tengda Wu
2026-04-03 9:47 ` [PATCH v2 05/16] perf annotate: Introduce extract_op_location callback for arch-specific parsing Tengda Wu
2026-04-03 9:47 ` [PATCH v2 06/16] perf dwarf-regs: Adapt get_dwarf_regnum() for arm64 Tengda Wu
2026-04-03 9:47 ` [PATCH v2 07/16] perf annotate-arm64: Implement extract_op_location() callback Tengda Wu
2026-04-07 7:26 ` Namhyung Kim
2026-04-10 10:27 ` Tengda Wu
2026-04-03 9:47 ` [PATCH v2 08/16] perf annotate-arm64: Enable instruction tracking support Tengda Wu
2026-04-10 6:09 ` Namhyung Kim
2026-04-10 10:29 ` Tengda Wu
2026-04-03 9:47 ` [PATCH v2 09/16] perf annotate-arm64: Support load instruction tracking Tengda Wu
2026-04-10 6:23 ` Namhyung Kim
2026-04-10 10:37 ` Tengda Wu
2026-04-03 9:47 ` [PATCH v2 10/16] perf annotate-arm64: Support store " Tengda Wu
2026-04-03 9:47 ` [PATCH v2 11/16] perf annotate-arm64: Support stack variable tracking Tengda Wu
2026-04-10 6:29 ` Namhyung Kim
2026-04-10 10:41 ` Tengda Wu
2026-04-03 9:47 ` [PATCH v2 12/16] perf annotate-arm64: Support 'mov' instruction tracking Tengda Wu
2026-04-10 6:39 ` Namhyung Kim
2026-04-10 10:53 ` Tengda Wu
2026-04-03 9:47 ` [PATCH v2 13/16] perf annotate-arm64: Support 'add' " Tengda Wu
2026-04-10 6:42 ` Namhyung Kim
2026-04-10 10:49 ` Tengda Wu
2026-04-03 9:47 ` [PATCH v2 14/16] perf annotate-arm64: Support 'adrp' instruction to track global variables Tengda Wu
2026-04-03 9:47 ` [PATCH v2 15/16] perf annotate-arm64: Support per-cpu variable access tracking Tengda Wu
2026-04-03 9:48 ` [PATCH v2 16/16] perf annotate-arm64: Support 'mrs' instruction to track 'current' pointer Tengda Wu
2026-04-10 6:52 ` Namhyung Kim
2026-04-10 10:44 ` Tengda Wu
2026-04-07 6:31 ` [PATCH v2 00/16] perf arm64: Support data type profiling Namhyung Kim
2026-04-08 11:35 ` Tengda Wu
2026-04-10 7:00 ` Namhyung Kim
2026-04-10 8:17 ` Tengda Wu
2026-04-14 15:10 ` James Clark
2026-04-15 1:34 ` Tengda Wu
2026-04-16 15:31 ` James Clark [this message]
2026-04-17 1:53 ` Tengda Wu
2026-04-20 9:31 ` James Clark
2026-04-22 9:50 ` James Clark
2026-04-27 8:43 ` Tengda Wu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55fde39f-36b4-44b4-bb9f-4179c6d2d066@linaro.org \
--to=james.clark@linaro.org \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=irogers@google.com \
--cc=kim.phillips@arm.com \
--cc=leo.yan@linux.dev \
--cc=lihuafei1@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=llvm@lists.linux.dev \
--cc=mark.rutland@arm.com \
--cc=mingo@redhat.com \
--cc=morbo@google.com \
--cc=namhyung@kernel.org \
--cc=nick.desaulniers+lkml@gmail.com \
--cc=peterz@infradead.org \
--cc=wutengda@huaweicloud.com \
--cc=zli94@ncsu.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox