From: Li Pengfei <ljdlns1987@gmail.com>
To: mhiramat@kernel.org, rostedt@goodmis.org
Cc: linux-trace-kernel@vger.kernel.org, linux-kernel@vger.kernel.org,
cmllamas@google.com, zhangbo56@xiaomi.com,
Pengfei Li <lipengfei28@xiaomi.com>
Subject: [RFC PATCH v3 0/3] trace: stack trace deduplication for ftrace ring buffer
Date: Tue, 26 May 2026 19:52:42 +0800 [thread overview]
Message-ID: <cover.1779769138.git.lipengfei28@xiaomi.com> (raw)
In-Reply-To: <20260514034916.2162517-1-lipengfei28@xiaomi.com>
From: Pengfei Li <lipengfei28@xiaomi.com>
Hi Masami, Steven, all,
This is v3 of the ftrace stackmap series. It addresses the Sashiko
review on v2 [1] that Masami pointed out.
[1] https://sashiko.dev/#/patchset/20260522104017.1668638-1-lipengfei28%40xiaomi.com
The series adds stack trace deduplication to ftrace. When the
stacktrace option is enabled, the ring buffer stores a 4-byte
stack_id instead of a full kernel stack trace, while the full
stacks are exported via tracefs.
Rebased onto v7.1-rc5 (e8c2f9fdadee) before sending.
Changes since v2
================
Patch 1 (lock-free stackmap):
- Hot-path counters changed from atomic64_t to per-CPU local_t.
This avoids the raw_spinlock_t fallback that atomic64_t uses on
32-bit GENERIC_ATOMIC64, which would deadlock from NMI context.
- reset() now serializes against tracefs readers via an
rw_semaphore (held for write during the clearing memset, held
for read by seq_file iteration and bin snapshot construction).
synchronize_rcu() alone was insufficient because seq_file/bin
readers are in process context, not preempt-disabled.
- get_id() uses atomic_read_acquire() on smap->resetting so
subsequent loads of entry->key/val are properly ordered after
the check (LKMM control dependencies only order stores).
- All plain reads of entry->key now use READ_ONCE() to avoid
LKMM data races with the cmpxchg writer.
- val->nr in the hot path now uses READ_ONCE() to keep style
consistent with the seq_show / bin_open readers.
- stackmap_seq_next() now updates *pos past map_size on EOF so
seq_read() terminates instead of looping on the last element.
- Added a comment in the cmpxchg-claim path documenting that
two CPUs racing with the same key_hash may produce a small
number of duplicate entries; this is an accepted trade-off
for keeping the hot path lock-free.
- Removed BUG_ON in create path (the constraint is satisfied by
construction; no runtime check needed).
Patch 2 (integration):
- 'stackmap' is added to TOP_LEVEL_TRACE_FLAGS and
ZEROED_TRACE_FLAGS so the option is only exposed under the
top-level trace instance, matching the convention used for
other global-only options such as 'printk' and 'record-cmd'.
Secondary instances under tracing/instances/*/ no longer see
the option at all, instead of seeing it as a silent no-op.
- TRACE_STACK_ID added to trace_valid_entry() in trace_selftest.c
so ftrace startup selftests don't reject the entry type.
- Corrected a comment about how global_trace.stackmap is
zero-initialized (BSS, not kzalloc).
Patch 3 (docs / selftest / tooling):
- Selftest now reads trace contents BEFORE switching back to the
nop tracer (tracer_init() calls tracing_reset_online_cpus()
which would have emptied the ring buffer).
- Added 'function:tracer' to the selftest '# requires:' line so
ftracetest skips when CONFIG_FUNCTION_TRACER is disabled
instead of failing spuriously.
- Selftest grep tightened to '<stack_id' to avoid future
false-positives if any other tracepoint name contains
"stack_id".
- New stackmap-instance-gate.tc selftest asserts the option and
stack_map* nodes are present on the global instance and absent
on a freshly-created secondary instance, locking in the
TOP_LEVEL_TRACE_FLAGS gating behavior introduced in patch 2.
- Documentation Performance section made vendor-neutral
("aarch64 SMP system" instead of a specific device name) and
the term "Hit rate" replaced with "Dedup rate" to match the
actual stat field name (success_rate).
- Documentation Design section now states that deduplication is
best-effort under heavy contention (cmpxchg races may produce
a small number of duplicate entries for the same stack), so
users observing entries > unique-stacks have a documented
explanation.
Test results
============
Device: Xiaomi SM8850 (ARM64), Android 16, kernel 6.12 (OGKI)
Config: CONFIG_FTRACE_STACKMAP=y, bits=14 (16384 elts, 32768 slots)
Method: 5-second capture with stacktrace trigger
Functional tests (all PASS):
- tracefs nodes (stack_map / stack_map_stat / stack_map_bin) exist
- options/stackmap writable, trace shows <stack_id N>
- stack_map text export with correct symbols
- reset clears entries when tracing stopped
- reset rejected (-EBUSY) while tracing active
- per-event trigger: only specified events get stacks
Performance (sched_switch, 5s):
entries: 466 / 16384
successes: 9159
drops: 0
success_rate: 100%
dedup rate: 95.2% (466 unique stacks / 9625 total events)
Performance (kmem_cache_alloc, 5s):
entries: 1177 / 16384
successes: 60078
drops: 0
success_rate: 100%
dedup rate: 98.1% (1177 unique stacks / 61255 total events)
Ring buffer space savings:
Event Full stack Stackmap Saving
---------------- --------------- --------------- ------
sched_switch 9625 × 88B=847KB 12B×9625+88B×466=156KB 82%
kmem_cache_alloc 61255×88B=5.4MB 12B×61255+88B×1177=839KB 85%
QEMU validation (v3 base: v7.1-rc5)
===================================
The series boots cleanly on aarch64 QEMU. A post-init smoke test
(12/12 PASS) verified all functional behaviors including:
- tracefs nodes appear with correct file modes
- stack_id events emitted, kernel symbols resolve correctly
(e.g. __schedule+0x7cc/0x1138)
- reset rejected with -EBUSY while tracing is active
- reset clears the map when tracing is stopped
- per-CPU local_t counters aggregate correctly across CPUs
- stack_map_bin magic correct (0x464D5342 'FSMB')
- 'stackmap' option visible on the global instance, hidden on
secondary instances under tracing/instances/*/
Boot-time activation via 'trace_options=stackmap,stacktrace' works:
events that fire before stackmap initialization fall back to
recording full stack traces; later events are deduplicated. No
events are dropped due to the transition.
Known limitations
=================
- Per-instance stackmap support is not included in this series.
Following the convention used for other global-only options
(PRINTK, RECORD_CMD), the 'stackmap' option is gated to the
top-level trace instance via TOP_LEVEL_TRACE_FLAGS, so it is
not exposed under tracing/instances/*/options/. Per-instance
maps would be a follow-up.
- The element pool is allocated eagerly at fs_initcall when
CONFIG_FTRACE_STACKMAP=y, regardless of whether userspace will
ever enable the option. At the default bits=14 this is roughly
8 MB of vmalloc; at the maximum bits=18, ~135 MB. The eager
allocation keeps the hot path entirely allocation-free and
avoids any allocation-failure path under tracing pressure.
Lazy allocation on first 'echo 1 > options/stackmap' is a
reasonable follow-up if maintainers prefer that trade-off.
- Deduplication is best-effort, not strict: under heavy
concurrent contention two CPUs racing in the insert path with
the same stack hash may each succeed in claiming a different
slot, producing a small number of duplicate entries for the
same stack. ref_count is then split across the duplicates.
This is intentional: it keeps the hot path lock-free and
bounds memory by the element pool size.
- The stackmap currently covers kernel stacks only.
- stack_map_bin is a best-effort snapshot, not a fully atomic export.
- trace-cmd / libtraceevent integration is left for follow-up once
the binary format settles.
Usage
=====
echo 1 > /sys/kernel/debug/tracing/options/stackmap
echo 1 > /sys/kernel/debug/tracing/options/stacktrace
Pengfei Li (3):
trace: add lock-free stackmap for stack trace deduplication
trace: integrate stackmap into ftrace stack recording path
trace: add documentation, selftest and tooling for stackmap
Documentation/trace/ftrace-stackmap.rst | 162 ++++
Documentation/trace/index.rst | 1 +
kernel/trace/Kconfig | 22 +
kernel/trace/Makefile | 1 +
kernel/trace/trace.c | 78 +-
kernel/trace/trace.h | 16 +
kernel/trace/trace_entries.h | 15 +
kernel/trace/trace_output.c | 23 +
kernel/trace/trace_selftest.c | 1 +
kernel/trace/trace_stackmap.c | 780 ++++++++++++++++++
kernel/trace/trace_stackmap.h | 57 ++
.../ftrace/test.d/ftrace/stackmap-basic.tc | 103 +++
.../test.d/ftrace/stackmap-instance-gate.tc | 42 +
tools/tracing/stackmap_dump.py | 150 ++++
14 files changed, 1449 insertions(+), 2 deletions(-)
create mode 100644 Documentation/trace/ftrace-stackmap.rst
create mode 100644 kernel/trace/trace_stackmap.c
create mode 100644 kernel/trace/trace_stackmap.h
create mode 100644 tools/testing/selftests/ftrace/test.d/ftrace/stackmap-basic.tc
create mode 100644 tools/testing/selftests/ftrace/test.d/ftrace/stackmap-instance-gate.tc
create mode 100755 tools/tracing/stackmap_dump.py
--
2.34.1
next prev parent reply other threads:[~2026-05-26 11:53 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-14 3:49 [RFC PATCH 0/3] trace: stack trace deduplication for ftrace ring buffer Li Pengfei
2026-05-14 3:49 ` [RFC PATCH 1/3] trace: add lock-free stackmap for stack trace deduplication Li Pengfei
2026-05-14 3:49 ` [RFC PATCH 2/3] trace: integrate stackmap into ftrace stack recording path Li Pengfei
2026-05-14 3:49 ` [RFC PATCH 3/3] trace: add documentation, selftest and tooling for stackmap Li Pengfei
2026-05-21 15:23 ` [RFC PATCH 0/3] trace: stack trace deduplication for ftrace ring buffer Steven Rostedt
2026-05-22 10:40 ` [RFC PATCH v2 " Li Pengfei
2026-05-22 10:40 ` [PATCH v2 1/3] trace: add lock-free stackmap for stack trace deduplication Li Pengfei
2026-05-22 10:40 ` [PATCH v2 2/3] trace: integrate stackmap into ftrace stack recording path Li Pengfei
2026-05-22 10:40 ` [PATCH v2 3/3] trace: add documentation, selftest and tooling for stackmap Li Pengfei
2026-05-25 6:58 ` [RFC PATCH v2 0/3] trace: stack trace deduplication for ftrace ring buffer Masami Hiramatsu
2026-05-25 7:39 ` Li Pengfei
2026-05-26 11:52 ` Li Pengfei [this message]
2026-05-26 11:52 ` [RFC PATCH v3 1/3] trace: add lock-free stackmap for stack trace deduplication Li Pengfei
2026-05-26 11:52 ` [RFC PATCH v3 2/3] trace: integrate stackmap into ftrace stack recording path Li Pengfei
2026-05-26 11:52 ` [RFC PATCH v3 3/3] trace: add documentation, selftest and tooling for stackmap Li Pengfei
2026-05-26 19:39 ` [RFC PATCH v3 0/3] trace: stack trace deduplication for ftrace ring buffer Steven Rostedt
2026-05-27 2:23 ` Li Pengfei
2026-06-08 2:06 ` Li Pengfei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1779769138.git.lipengfei28@xiaomi.com \
--to=ljdlns1987@gmail.com \
--cc=cmllamas@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=lipengfei28@xiaomi.com \
--cc=mhiramat@kernel.org \
--cc=rostedt@goodmis.org \
--cc=zhangbo56@xiaomi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.