From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Leo Yan <leo.yan@arm.com>
Cc: John Garry <john.g.garry@oracle.com>,
Will Deacon <will@kernel.org>,
James Clark <james.clark@linaro.org>,
Mike Leach <mike.leach@arm.com>,
Suzuki K Poulose <suzuki.poulose@arm.com>,
Namhyung Kim <namhyung@kernel.org>,
Mark Rutland <mark.rutland@arm.com>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Jiri Olsa <jolsa@kernel.org>, Ian Rogers <irogers@google.com>,
Adrian Hunter <adrian.hunter@intel.com>,
Al Grant <al.grant@arm.com>,
Paschalis Mpeis <paschalis.mpeis@arm.com>,
Amir Ayupov <aaupov@fb.com>,
linux-arm-kernel@lists.infradead.org, coresight@lists.linaro.org,
linux-perf-users@vger.kernel.org, Leo Yan <leo.yan@linux.dev>
Subject: Re: [PATCH v6 0/8] perf cs-etm: Support thread stack and callchain
Date: Fri, 29 May 2026 11:57:14 -0300 [thread overview]
Message-ID: <ahmpSnDTJ14_BNiv@x1> (raw)
In-Reply-To: <20260526-b4-arm_cs_callchain_support_v1-v6-0-f9f49f53c9dd@arm.com>
On Tue, May 26, 2026 at 05:59:36PM +0100, Leo Yan wrote:
> This series adds thread-stack and synthesized callchain support for Arm
> CoreSight, which comes from older series [1] but heavily rewritten.
Hi Leo,
Please add what changed from v5, v4, etc.
- Arnaldo
> CS ETM previously kept last-branch state in a per-trace-queue buffer.
> That effectively makes the state per CPU, while the call/return history
> belongs to a thread. This series moves branch tracking to the common
> thread-stack code.
>
> The series records CoreSight branches with thread_stack__event(), uses
> thread_stack__br_sample() for last branch entries, flushes thread stacks
> after decoder resets.
>
> A decoder reset between AUX trace buffers is treated as a global trace
> discontinuity, so all thread stacks are flushed, so avoids carrying
> stale call/return history across a trace discontinuity.
>
> One limitation remains for instructions emulated by the kernel. In that
> case the exception return address may not match the return address
> stored in the thread stack, because after exception return can be one
> instruction ahead. The stack can still recover when a later return
> matches an upper caller. Given emulated instructions are not the common
> target for performance callchain analysis. Supporting this would require
> extending the common thread-stack path to accept both the real target
> address and an adjusted address for stack matching, so this series
> leaves that extra complexity out.
>
> The series has been tested on Orion6 board:
>
> perf test 150 -vvv
>
> 150: Check Arm CoreSight synthesized callchain:
> --- start ---
> test child forked, pid 13528
> Test callchain push: PASS
> Test callchain pop: PASS
> ---- end(0) ----
> 150: Check Arm CoreSight synthesized callchain : Ok
>
> perf script --itrace=g16i10il64
>
> callchain_test 17468 [005] 1031003.229943: 10 instructions:
> aaaac32507c4 main+0x8 (/home/kernel/leoy/test_cs_callchain/callchain_test)
> ffff90bd225c __libc_start_call_main+0x7c (/usr/lib/aarch64-linux-gnu/libc.so.6)
> ffff90bd233c call_init+0x9c (inlined)
> ffff90bd233c __libc_start_main_impl+0x9c (inlined)
> aaaac3250670 _start+0x30 (/home/kernel/leoy/test_cs_callchain/callchain_test)
>
> callchain_test 17468 [005] 1031003.229943: 10 instructions:
> aaaac3250774 do_svc+0xc (/home/kernel/leoy/test_cs_callchain/callchain_test)
> aaaac3250798 print+0xc (/home/kernel/leoy/test_cs_callchain/callchain_test)
> aaaac32507b0 foo+0xc (/home/kernel/leoy/test_cs_callchain/callchain_test)
> aaaac32507c8 main+0xc (/home/kernel/leoy/test_cs_callchain/callchain_test)
> ffff90bd225c __libc_start_call_main+0x7c (/usr/lib/aarch64-linux-gnu/libc.so.6)
> ffff90bd233c call_init+0x9c (inlined)
> ffff90bd233c __libc_start_main_impl+0x9c (inlined)
> aaaac3250670 _start+0x30 (/home/kernel/leoy/test_cs_callchain/callchain_test)
>
> callchain_test 17468 [005] 1031003.229944: 10 instructions:
> ffff800080010c20 vectors+0x420 ([kernel.kallsyms])
> aaaac3250784 do_svc+0x1c (/home/kernel/leoy/test_cs_callchain/callchain_test)
> aaaac3250798 print+0xc (/home/kernel/leoy/test_cs_callchain/callchain_test)
> aaaac32507b0 foo+0xc (/home/kernel/leoy/test_cs_callchain/callchain_test)
> aaaac32507c8 main+0xc (/home/kernel/leoy/test_cs_callchain/callchain_test)
> ffff90bd225c __libc_start_call_main+0x7c (/usr/lib/aarch64-linux-gnu/libc.so.6)
> ffff90bd233c call_init+0x9c (inlined)
> ffff90bd233c __libc_start_main_impl+0x9c (inlined)
> aaaac3250670 _start+0x30 (/home/kernel/leoy/test_cs_callchain/callchain_test)
>
> Note, the test fails on Juno board which is caused by many discontinuity
> packets (mainly caused by NO_SYNC elem). This is likely caused by the
> FIFO overflow on the path.
>
> [1] https://lore.kernel.org/linux-arm-kernel/20200220052701.7754-1-leo.yan@linaro.org/
>
> Signed-off-by: Leo Yan <leo.yan@arm.com>
> ---
> Leo Yan (8):
> perf cs-etm: Decode ETE exception packets
> perf cs-etm: Refactor instruction size handling
> perf cs-etm: Use thread-stack for last branch entries
> perf cs-etm: Flush thread stacks after decoder reset
> perf cs-etm: Support call indentation
> perf cs-etm: Filter synthesized branch samples
> perf cs-etm: Synthesize callchains for instruction samples
> perf test: Add Arm CoreSight callchain test
>
> .../tests/shell/test_arm_coresight_callchain.sh | 235 ++++++++++++++++
> tools/perf/util/cs-etm.c | 309 ++++++++++++---------
> 2 files changed, 408 insertions(+), 136 deletions(-)
> ---
> base-commit: bd2a5be1fe731bc7548205dd148db75f1d588da2
> change-id: 20260521-b4-arm_cs_callchain_support_v1-2c2a70719bcc
>
> Best regards,
> --
> Leo Yan <leo.yan@arm.com>
>
prev parent reply other threads:[~2026-05-29 14:57 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-26 16:59 [PATCH v6 0/8] perf cs-etm: Support thread stack and callchain Leo Yan
2026-05-26 16:59 ` [PATCH v6 1/8] perf cs-etm: Decode ETE exception packets Leo Yan
2026-05-26 16:59 ` [PATCH v6 2/8] perf cs-etm: Refactor instruction size handling Leo Yan
2026-05-26 16:59 ` [PATCH v6 3/8] perf cs-etm: Use thread-stack for last branch entries Leo Yan
2026-05-26 18:32 ` sashiko-bot
2026-05-26 16:59 ` [PATCH v6 4/8] perf cs-etm: Flush thread stacks after decoder reset Leo Yan
2026-05-26 19:10 ` sashiko-bot
2026-05-26 16:59 ` [PATCH v6 5/8] perf cs-etm: Support call indentation Leo Yan
2026-05-26 19:42 ` sashiko-bot
2026-05-26 16:59 ` [PATCH v6 6/8] perf cs-etm: Filter synthesized branch samples Leo Yan
2026-05-26 16:59 ` [PATCH v6 7/8] perf cs-etm: Synthesize callchains for instruction samples Leo Yan
2026-05-26 16:59 ` [PATCH v6 8/8] perf test: Add Arm CoreSight callchain test Leo Yan
2026-05-26 20:56 ` sashiko-bot
2026-05-29 14:57 ` Arnaldo Carvalho de Melo [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ahmpSnDTJ14_BNiv@x1 \
--to=acme@kernel.org \
--cc=aaupov@fb.com \
--cc=adrian.hunter@intel.com \
--cc=al.grant@arm.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=coresight@lists.linaro.org \
--cc=irogers@google.com \
--cc=james.clark@linaro.org \
--cc=john.g.garry@oracle.com \
--cc=jolsa@kernel.org \
--cc=leo.yan@arm.com \
--cc=leo.yan@linux.dev \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mike.leach@arm.com \
--cc=namhyung@kernel.org \
--cc=paschalis.mpeis@arm.com \
--cc=suzuki.poulose@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox