public inbox for linux-perf-users@vger.kernel.org
 help / color / mirror / Atom feed
From: James Clark <james.clark@linaro.org>
To: Tengda Wu <wutengda@huaweicloud.com>
Cc: Bill Wendling <morbo@google.com>,
	Nick Desaulniers <nick.desaulniers+lkml@gmail.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Zecheng Li <zli94@ncsu.edu>,
	linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org,
	llvm@lists.linux.dev, Peter Zijlstra <peterz@infradead.org>,
	Namhyung Kim <namhyung@kernel.org>,
	leo.yan@linux.dev, Li Huafei <lihuafei1@huawei.com>,
	Ian Rogers <irogers@google.com>,
	Kim Phillips <kim.phillips@arm.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Ingo Molnar <mingo@redhat.com>
Subject: Re: [PATCH v2 00/16] perf arm64: Support data type profiling
Date: Wed, 22 Apr 2026 10:50:55 +0100	[thread overview]
Message-ID: <eb00bda8-7ad7-4101-8fa6-1ecbb2f2fb82@linaro.org> (raw)
In-Reply-To: <20260403094800.1418825-1-wutengda@huaweicloud.com>



On 03/04/2026 10:47, Tengda Wu wrote:
> This patch series implements data type profiling support for arm64,
> building upon the foundational work previously contributed by Huafei [1].
> While the initial version laid the groundwork for arm64 data type analysis,
> this series iterates on that work by refining instruction parsing and
> extending support for core architectural features.
> 
> The series is organized as follows:
> 
> 1. Fix disassembly mismatches (Patches 01-02)
>     Current perf annotate supports three disassembly backends: llvm,
>     capstone, and objdump. On arm64, inconsistencies between the output
>     of these backends (specifically llvm/capstone vs. objdump) often
>     prevent the tracker from correctly identifying registers and offsets.
>     These patches resolve these mismatches, ensuring consistent instruction
>     parsing across all supported backends.

Did you try recording the Perf datasym workload? With llvm-objdump I 
only get hits on data1 and not data2. And with binutils I don't get any 
hits on that struct at all, although the rest of the samples in 
ld-linux-aarch64.so etc look roughly the same between binutils and llvm. 
I would have thought such a simple example like datasym would work with 
both.

  $ perf record -e arm_spe_0/load_filter=1,store_filter=1,
     min_latency=30/u -c 10000 -- perf test -w datasym

  $ perf annotate --data-type --type-stat --itrace=i1i --stdio

With llvm-objdump-14:

   Annotate data type stats:
   total 25, ok 19 (76.0%), bad 6 (24.0%)
   -----------------------------------------------------------
          1 : no_sym
          1 : no_mem_ops
          3 : no_var
          1 : no_typeinfo
          9 : insn_track

   Annotate type: 'struct buf' in build/local/perf (6663 samples):
   ===============================================================
    Percent     offset       size  field
     100.00          0       0x40  struct buf       {
     100.00          0        0x1      char        data1;
       0.00        0x1       0x37      char[]      reserved;
       0.00       0x38        0x1      char        data2;
                                 };



With binutils that entry is missing:

   Annotate data type stats:
   total 25, ok 14 (56.0%), bad 11 (44.0%)
   -----------------------------------------------------------
          1 : no_sym
          1 : no_cuinfo
          3 : no_var
          6 : no_typeinfo
          4 : insn_track

...


But with the following patch I get plausible output for datasym with 
llvm where both entries in the struct have hits. It looks like you need 
to add the offset when calling get_global_var_type() for 
TSR_KIND_GLOBAL_ADDR otherwise all entries point to the first member of 
the struct:

   Annotate data type stats:
   total 4, ok 2 (50.0%), bad 2 (50.0%)
   -----------------------------------------------------------
          1 : no_sym
          1 : no_typeinfo
          2 : insn_track

   Annotate type: 'struct buf' in build/local/perf (35 samples):
   =====================================================================
    Percent     offset       size  field
     100.00          0       0x39  struct buf       {
      40.00          0        0x1      char        data1;
       0.00        0x1       0x37      char[]      reserved;
      60.00       0x38        0x1      char        data2;
                                 };


diff --git a/tools/perf/util/annotate-data.c 
b/tools/perf/util/annotate-data.c
index 7161417d1c76..0e5825121227 100644
--- a/tools/perf/util/annotate-data.c
+++ b/tools/perf/util/annotate-data.c
@@ -1287,7 +1287,9 @@ static enum type_match_result 
check_matching_type(struct type_state *state,
                  * The register holds the address of a global variable. 
  Try to
                  * find the variable by the address and get its type.
                  */
-               if (get_global_var_type(cu_die, dloc, dloc->ip, 
state->regs[reg].addr,
+               var_addr = state->regs[reg].addr + dloc->op->offset;
+
+               if (get_global_var_type(cu_die, dloc, dloc->ip, var_addr,
                                         &var_offset, type_die)) {
                         dloc->type_offset = var_offset;


> 
> 2. Infrastructure for arm64 operand parsing (Patches 03-07)
>     These patches establish the necessary infrastructure for arm64-specific
>     operand handling. This includes implementing new callbacks and data
>     structures to manage arm64's unique addressing modes and register sets.
>     This foundation is essential for the subsequent type-tracking logic.
> 
> 3. Core instruction tracking (Patches 08-16)
>     These patches implement the core logic for type tracking on arm64,
>     covering a wide range of instructions including:
> 
>     * Memory Access: ldr/str variants (including stack-based access).
>     * Arithmetic & Data Processing: mov, add, and adrp.
>     * Special Access: System register access (mrs) and per-cpu variable
>       tracking.
> 
> The implementation draws inspiration from the existing x86 logic while
> adapting it to the nuances of the AArch64 ISA [2][3]. With these changes,
> perf annotate can successfully resolve memory locations and register
> types, enabling comprehensive data type profiling on arm64 platforms.
> 
> Example Result
> ==============
> 
> # perf mem record -a -K -- sleep 1
> # perf annotate --data-type --type-stat --stdio
> Annotate data type stats:
> total 6204, ok 5091 (82.1%), bad 1113 (17.9%)
> -----------------------------------------------------------
>          29 : no_sym
>         196 : no_var
>         806 : no_typeinfo
>          82 : bad_offset
>        1370 : insn_track
> 
> Annotate type: 'struct page' in [kernel.kallsyms] (59208 samples):
> ============================================================================
>   Percent     offset       size  field
>    100.00          0       0x40  struct page      {
>      9.95          0        0x8      long unsigned int   flags;
>     52.83        0x8       0x28      union        {
>     52.83        0x8       0x28          struct   {
>     37.21        0x8       0x10              union        {
>     37.21        0x8       0x10                  struct list_head        lru {
>     37.21        0x8        0x8                      struct list_head*   next;
>      0.00       0x10        0x8                      struct list_head*   prev;
>                                                  };
>     37.21        0x8       0x10                  struct   {
>     37.21        0x8        0x8                      void*       __filler;
>      0.00       0x10        0x4                      unsigned int        mlock_count;
>     ...
> 
> Changes since v1: (reworked from Huafei's series):
> 
>   - Fix inconsistencies in arm64 instruction output across llvm, capstone,
>     and objdump disassembly backends.
>   - Support arm64-specific addressing modes and operand formats. (Leo Yan)
>   - Extend instruction tracking to support mov and add instructions,
>     along with per-cpu and stack variables.
>   - Include real-world examples in commit messages to demonstrate
>     practical effects. (Namhyung Kim)
>   - Improve type-tracking success rate (type stat) from 64.2% to 82.1%.
>     https://lore.kernel.org/all/20250314162137.528204-1-lihuafei1@huawei.com/
> 
> Please let me know if you have any feedback.
> 
> Thanks,
> Tengda
> 
> [1] https://lore.kernel.org/all/20250314162137.528204-1-lihuafei1@huawei.com/
> [2] https://developer.arm.com/documentation/102374/0103
> [3] https://github.com/flynd/asmsheets/releases/tag/v8
> 
> ---
> 
> Tengda Wu (16):
>    perf llvm: Fix arm64 adrp instruction disassembly mismatch with
>      objdump
>    perf capstone: Fix arm64 jump/adrp disassembly mismatch with objdump
>    perf annotate-arm64: Generalize arm64_mov__parse to support standard
>      operands
>    perf annotate-arm64: Handle load and store instructions
>    perf annotate: Introduce extract_op_location callback for
>      arch-specific parsing
>    perf dwarf-regs: Adapt get_dwarf_regnum() for arm64
>    perf annotate-arm64: Implement extract_op_location() callback
>    perf annotate-arm64: Enable instruction tracking support
>    perf annotate-arm64: Support load instruction tracking
>    perf annotate-arm64: Support store instruction tracking
>    perf annotate-arm64: Support stack variable tracking
>    perf annotate-arm64: Support 'mov' instruction tracking
>    perf annotate-arm64: Support 'add' instruction tracking
>    perf annotate-arm64: Support 'adrp' instruction to track global
>      variables
>    perf annotate-arm64: Support per-cpu variable access tracking
>    perf annotate-arm64: Support 'mrs' instruction to track 'current'
>      pointer
> 
>   .../perf/util/annotate-arch/annotate-arm64.c  | 642 +++++++++++++++++-
>   .../util/annotate-arch/annotate-powerpc.c     |  10 +
>   tools/perf/util/annotate-arch/annotate-x86.c  |  88 ++-
>   tools/perf/util/annotate-data.c               |  72 +-
>   tools/perf/util/annotate-data.h               |   7 +-
>   tools/perf/util/annotate.c                    | 108 +--
>   tools/perf/util/annotate.h                    |  12 +
>   tools/perf/util/capstone.c                    | 107 ++-
>   tools/perf/util/disasm.c                      |   5 +
>   tools/perf/util/disasm.h                      |   5 +
>   .../util/dwarf-regs-arch/dwarf-regs-arm64.c   |  20 +
>   tools/perf/util/dwarf-regs.c                  |   2 +-
>   tools/perf/util/include/dwarf-regs.h          |   1 +
>   tools/perf/util/llvm.c                        |  50 ++
>   14 files changed, 984 insertions(+), 145 deletions(-)
> 
> 
> base-commit: cf7c3c02fdd0dfccf4d6611714273dcb538af2cb


  parent reply	other threads:[~2026-04-22  9:50 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-03  9:47 [PATCH v2 00/16] perf arm64: Support data type profiling Tengda Wu
2026-04-03  9:47 ` [PATCH v2 01/16] perf llvm: Fix arm64 adrp instruction disassembly mismatch with objdump Tengda Wu
2026-04-03  9:47 ` [PATCH v2 02/16] perf capstone: Fix arm64 jump/adrp " Tengda Wu
2026-04-07  6:43   ` Namhyung Kim
2026-04-10  9:08     ` Tengda Wu
2026-04-14 13:51   ` James Clark
2026-04-03  9:47 ` [PATCH v2 03/16] perf annotate-arm64: Generalize arm64_mov__parse to support standard operands Tengda Wu
2026-04-07  6:58   ` Namhyung Kim
2026-04-10 10:06     ` Tengda Wu
2026-04-14 14:13     ` James Clark
2026-04-03  9:47 ` [PATCH v2 04/16] perf annotate-arm64: Handle load and store instructions Tengda Wu
2026-04-07  7:09   ` Namhyung Kim
2026-04-10 10:16     ` Tengda Wu
2026-04-03  9:47 ` [PATCH v2 05/16] perf annotate: Introduce extract_op_location callback for arch-specific parsing Tengda Wu
2026-04-03  9:47 ` [PATCH v2 06/16] perf dwarf-regs: Adapt get_dwarf_regnum() for arm64 Tengda Wu
2026-04-03  9:47 ` [PATCH v2 07/16] perf annotate-arm64: Implement extract_op_location() callback Tengda Wu
2026-04-07  7:26   ` Namhyung Kim
2026-04-10 10:27     ` Tengda Wu
2026-04-03  9:47 ` [PATCH v2 08/16] perf annotate-arm64: Enable instruction tracking support Tengda Wu
2026-04-10  6:09   ` Namhyung Kim
2026-04-10 10:29     ` Tengda Wu
2026-04-03  9:47 ` [PATCH v2 09/16] perf annotate-arm64: Support load instruction tracking Tengda Wu
2026-04-10  6:23   ` Namhyung Kim
2026-04-10 10:37     ` Tengda Wu
2026-04-03  9:47 ` [PATCH v2 10/16] perf annotate-arm64: Support store " Tengda Wu
2026-04-03  9:47 ` [PATCH v2 11/16] perf annotate-arm64: Support stack variable tracking Tengda Wu
2026-04-10  6:29   ` Namhyung Kim
2026-04-10 10:41     ` Tengda Wu
2026-04-03  9:47 ` [PATCH v2 12/16] perf annotate-arm64: Support 'mov' instruction tracking Tengda Wu
2026-04-10  6:39   ` Namhyung Kim
2026-04-10 10:53     ` Tengda Wu
2026-04-03  9:47 ` [PATCH v2 13/16] perf annotate-arm64: Support 'add' " Tengda Wu
2026-04-10  6:42   ` Namhyung Kim
2026-04-10 10:49     ` Tengda Wu
2026-04-03  9:47 ` [PATCH v2 14/16] perf annotate-arm64: Support 'adrp' instruction to track global variables Tengda Wu
2026-04-03  9:47 ` [PATCH v2 15/16] perf annotate-arm64: Support per-cpu variable access tracking Tengda Wu
2026-04-03  9:48 ` [PATCH v2 16/16] perf annotate-arm64: Support 'mrs' instruction to track 'current' pointer Tengda Wu
2026-04-10  6:52   ` Namhyung Kim
2026-04-10 10:44     ` Tengda Wu
2026-04-07  6:31 ` [PATCH v2 00/16] perf arm64: Support data type profiling Namhyung Kim
2026-04-08 11:35   ` Tengda Wu
2026-04-10  7:00     ` Namhyung Kim
2026-04-10  8:17       ` Tengda Wu
2026-04-14 15:10 ` James Clark
2026-04-15  1:34   ` Tengda Wu
2026-04-16 15:31 ` James Clark
2026-04-17  1:53   ` Tengda Wu
2026-04-20  9:31     ` James Clark
2026-04-22  9:50 ` James Clark [this message]
2026-04-27  8:43   ` Tengda Wu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=eb00bda8-7ad7-4101-8fa6-1ecbb2f2fb82@linaro.org \
    --to=james.clark@linaro.org \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=irogers@google.com \
    --cc=kim.phillips@arm.com \
    --cc=leo.yan@linux.dev \
    --cc=lihuafei1@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=llvm@lists.linux.dev \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=morbo@google.com \
    --cc=namhyung@kernel.org \
    --cc=nick.desaulniers+lkml@gmail.com \
    --cc=peterz@infradead.org \
    --cc=wutengda@huaweicloud.com \
    --cc=zli94@ncsu.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox