* [RFC PATCH 0/3] trace: stack trace deduplication for ftrace ring buffer
@ 2026-05-14 3:49 Li Pengfei
2026-05-14 3:49 ` [RFC PATCH 1/3] trace: add lock-free stackmap for stack trace deduplication Li Pengfei
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Li Pengfei @ 2026-05-14 3:49 UTC (permalink / raw)
To: linux-trace-kernel
Cc: rostedt, mhiramat, linux-kernel, cmllamas, zhangbo56, lipengfei28
From: Pengfei Li <lipengfei28@xiaomi.com>
Hi Steven, all,
This series adds stack trace deduplication to ftrace, reducing ring
buffer usage by ~80% when stacktrace is enabled.
Problem:
When the stacktrace option is enabled, each trace event stores a full
kernel stack (typically 10-20 frames x 8 bytes = 80-160 bytes). On
production devices with 4-8MB trace buffers, this fills the buffer in
seconds, limiting the usefulness of boot-time tracing and always-on
performance monitoring.
Solution:
A lock-free hash map (modeled after tracing_map.c as suggested by
Steven [1]) that deduplicates stack traces. The ring buffer stores
only a 4-byte stack_id; full stacks are exported via tracefs.
Design (following tracing_map.c pattern):
- Lock-free insert via cmpxchg (NMI/IRQ/any context safe)
- Pre-allocated element pool (zero allocation on hot path)
- Linear probing with 2x over-provisioned table
- Per-trace_array instance support
We adopted the same lock-free algorithm as tracing_map but with a
purpose-built data structure, because tracing_map's API is designed
for histogram aggregation with fixed-size keys and sum/var fields,
while our use case requires variable-length stack traces with
reference counting.
Test results (ARM64, Qualcomm SM8850, kernel 6.12):
- kmem_cache_alloc events, 1 second capture:
774 unique stacks, 8264 hits, 0 drops, 100% hit rate
Ring buffer savings: 795KB -> 176KB (78% reduction)
- Function tracer, 3 seconds:
3632 unique stacks, 25466 hits, 0 drops
Ring buffer savings: 2.5MB -> 653KB (74% reduction)
Note: An earlier prototype using rhashtable crashed in IRQ context
(BUG at rhashtable.h:912), which led us to adopt the tracing_map
cmpxchg-based approach.
Usage:
echo 1 > /sys/kernel/debug/tracing/options/stackmap
echo 1 > /sys/kernel/debug/tracing/options/stacktrace
# trace output: <stack_id 42>
# resolve: cat /sys/kernel/debug/tracing/stack_map
[1] https://lore.kernel.org/all/20260513085145.30dd23e0@fedora/
Pengfei Li (3):
trace: add lock-free stackmap for stack trace deduplication
trace: integrate stackmap into ftrace stack recording path
trace: add documentation, selftest and tooling for stackmap
Documentation/trace/ftrace-stackmap.rst | 111 ++++
kernel/trace/Kconfig | 21 +
kernel/trace/Makefile | 1 +
kernel/trace/trace.c | 46 ++
kernel/trace/trace.h | 16 +
kernel/trace/trace_entries.h | 15 +
kernel/trace/trace_output.c | 23 +
kernel/trace/trace_stackmap.c | 569 ++++++++++++++++++
kernel/trace/trace_stackmap.h | 54 ++
.../ftrace/test.d/ftrace/stackmap-basic.tc | 74 +++
tools/tracing/stackmap_dump.py | 120 ++++
11 files changed, 1050 insertions(+)
create mode 100644 Documentation/trace/ftrace-stackmap.rst
create mode 100644 kernel/trace/trace_stackmap.c
create mode 100644 kernel/trace/trace_stackmap.h
create mode 100755 tools/testing/selftests/ftrace/test.d/ftrace/stackmap-basic.tc
create mode 100755 tools/tracing/stackmap_dump.py
--
2.34.1
^ permalink raw reply [flat|nested] 4+ messages in thread
* [RFC PATCH 1/3] trace: add lock-free stackmap for stack trace deduplication
2026-05-14 3:49 [RFC PATCH 0/3] trace: stack trace deduplication for ftrace ring buffer Li Pengfei
@ 2026-05-14 3:49 ` Li Pengfei
2026-05-14 3:49 ` [RFC PATCH 2/3] trace: integrate stackmap into ftrace stack recording path Li Pengfei
2026-05-14 3:49 ` [RFC PATCH 3/3] trace: add documentation, selftest and tooling for stackmap Li Pengfei
2 siblings, 0 replies; 4+ messages in thread
From: Li Pengfei @ 2026-05-14 3:49 UTC (permalink / raw)
To: linux-trace-kernel
Cc: rostedt, mhiramat, linux-kernel, cmllamas, zhangbo56, lipengfei28
From: Pengfei Li <lipengfei28@xiaomi.com>
Add a lock-free hash map (ftrace_stackmap) that deduplicates kernel
stack traces for the ftrace ring buffer. Instead of storing full
stack traces (80-160 bytes each) in the ring buffer for every event,
ftrace can store a 4-byte stack_id when the stackmap option is enabled.
The implementation is modeled after tracing_map.c (used by hist
triggers), using the same lock-free design based on Dr. Cliff Click's
non-blocking hash table algorithm:
- Lock-free insert via cmpxchg (safe in NMI/IRQ/any context)
- Pre-allocated element pool (zero allocation on hot path)
- Linear probing with 2x over-provisioned table
- Per-trace_array instance support
The stackmap is exported via three tracefs nodes:
- stack_map: text export with symbol resolution
- stack_map_stat: statistics (entries, hits, drops, hit_rate)
- stack_map_bin: binary export for efficient userspace consumption
Kernel command line parameter:
- ftrace_stackmap.bits=N: set map capacity (2^N unique stacks)
Test results on ARM64 (SM8850, Android 16, kernel 6.12):
- 774 unique stacks from kmem_cache_alloc in 1 second
- 100% hit rate, 0 drops
- 92% hit rate under heavy load (all kmem events)
Signed-off-by: Pengfei Li <lipengfei28@xiaomi.com>
---
kernel/trace/Kconfig | 21 ++
kernel/trace/Makefile | 1 +
kernel/trace/trace_stackmap.c | 569 ++++++++++++++++++++++++++++++++++
kernel/trace/trace_stackmap.h | 54 ++++
4 files changed, 645 insertions(+)
create mode 100644 kernel/trace/trace_stackmap.c
create mode 100644 kernel/trace/trace_stackmap.h
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index e130da35808f..2a63fd2c9a96 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -412,6 +412,27 @@ config STACK_TRACER
Say N if unsure.
+config FTRACE_STACKMAP
+ bool "Ftrace stack map deduplication"
+ depends on TRACING
+ depends on STACKTRACE
+ select KALLSYMS
+ help
+ This enables a global stack trace hash table for ftrace, inspired
+ by eBPF's BPF_MAP_TYPE_STACK_TRACE. When enabled, ftrace can store
+ only a stack_id in the ring buffer instead of the full stack trace,
+ significantly reducing trace buffer usage when the same call stacks
+ appear repeatedly.
+
+ The deduplicated stacks are exported via:
+ /sys/kernel/debug/tracing/stack_map
+
+ Writing to this file resets the stack map. Reading shows all unique
+ stacks with their stack_id and reference count.
+
+ Say Y if you want to reduce ftrace buffer usage for stack traces.
+ Say N if unsure.
+
config TRACE_PREEMPT_TOGGLE
bool
help
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index 1decdce8cbef..f1b6175099cc 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -85,6 +85,7 @@ obj-$(CONFIG_HWLAT_TRACER) += trace_hwlat.o
obj-$(CONFIG_OSNOISE_TRACER) += trace_osnoise.o
obj-$(CONFIG_NOP_TRACER) += trace_nop.o
obj-$(CONFIG_STACK_TRACER) += trace_stack.o
+obj-$(CONFIG_FTRACE_STACKMAP) += trace_stackmap.o
obj-$(CONFIG_MMIOTRACE) += trace_mmiotrace.o
obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += trace_functions_graph.o
obj-$(CONFIG_TRACE_BRANCH_PROFILING) += trace_branch.o
diff --git a/kernel/trace/trace_stackmap.c b/kernel/trace/trace_stackmap.c
new file mode 100644
index 000000000000..c402e7e7f902
--- /dev/null
+++ b/kernel/trace/trace_stackmap.c
@@ -0,0 +1,569 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Ftrace Stack Map - Lock-free stack trace deduplication for ftrace
+ *
+ * Modeled after tracing_map.c (used by hist triggers), this provides
+ * a lock-free hash map optimized for the ftrace hot path. The design
+ * is based on Dr. Cliff Click's non-blocking hash table algorithm.
+ *
+ * Key properties:
+ * - Lock-free insert via cmpxchg (safe in NMI/IRQ/any context)
+ * - Pre-allocated element pool (zero allocation on hot path)
+ * - Linear probing with 2x over-provisioned table
+ * - Per-trace_array instance support
+ *
+ * The 32-bit jhash of the stack IPs is used as the hash table key.
+ * On hash collision (different stacks, same 32-bit hash), linear
+ * probing finds the next slot. Full stack comparison (memcmp) is
+ * used to confirm matches.
+ */
+
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/jhash.h>
+#include <linux/seq_file.h>
+#include <linux/kallsyms.h>
+#include <linux/vmalloc.h>
+#include <linux/atomic.h>
+#include <linux/random.h>
+
+#include "trace.h"
+#include "trace_stackmap.h"
+
+/*
+ * Each pre-allocated element holds one unique stack trace.
+ * Fixed size: MAX_DEPTH entries regardless of actual depth.
+ */
+struct stackmap_elt {
+ u32 nr; /* actual number of IPs */
+ atomic_t ref_count;
+ unsigned long ips[FTRACE_STACKMAP_MAX_DEPTH];
+};
+
+/*
+ * Hash table entry: a 32-bit key (jhash of stack) + pointer to elt.
+ * key == 0 means the slot is free.
+ */
+struct stackmap_entry {
+ u32 key; /* 0 = free, non-zero = jhash */
+ struct stackmap_elt *val; /* NULL until fully published */
+};
+
+struct ftrace_stackmap {
+ unsigned int map_bits;
+ unsigned int map_size; /* 1 << (map_bits + 1) */
+ unsigned int max_elts; /* 1 << map_bits */
+ atomic_t next_elt; /* index into elts pool */
+ struct stackmap_entry *entries; /* hash table */
+ struct stackmap_elt **elts; /* pre-allocated pool */
+ atomic_t resetting;
+ atomic64_t hits;
+ atomic64_t drops;
+};
+
+static u32 stackmap_hash_seed;
+
+static unsigned int stackmap_map_bits = 14; /* 16384 elts, 32768 slots */
+static int __init stackmap_bits_setup(char *str)
+{
+ unsigned long val;
+
+ if (kstrtoul(str, 0, &val))
+ return -EINVAL;
+ val = clamp_val(val, 10, 20); /* 1K - 1M elts */
+ stackmap_map_bits = val;
+ return 0;
+}
+early_param("ftrace_stackmap.bits", stackmap_bits_setup);
+
+/* --- Element pool --- */
+
+static struct stackmap_elt *stackmap_get_elt(struct ftrace_stackmap *smap)
+{
+ int idx;
+
+ idx = atomic_fetch_add_unless(&smap->next_elt, 1, smap->max_elts);
+ if (idx < smap->max_elts)
+ return smap->elts[idx];
+ return NULL;
+}
+
+static int stackmap_alloc_elts(struct ftrace_stackmap *smap)
+{
+ unsigned int i;
+
+ smap->elts = vzalloc(sizeof(*smap->elts) * smap->max_elts);
+ if (!smap->elts)
+ return -ENOMEM;
+
+ for (i = 0; i < smap->max_elts; i++) {
+ smap->elts[i] = kzalloc(sizeof(struct stackmap_elt), GFP_KERNEL);
+ if (!smap->elts[i])
+ goto fail;
+ }
+ return 0;
+fail:
+ while (i--)
+ kfree(smap->elts[i]);
+ vfree(smap->elts);
+ smap->elts = NULL;
+ return -ENOMEM;
+}
+
+static void stackmap_free_elts(struct ftrace_stackmap *smap)
+{
+ unsigned int i;
+
+ if (!smap->elts)
+ return;
+ for (i = 0; i < smap->max_elts; i++)
+ kfree(smap->elts[i]);
+ vfree(smap->elts);
+ smap->elts = NULL;
+}
+
+/* --- Create / Destroy / Reset --- */
+
+struct ftrace_stackmap *ftrace_stackmap_create(void)
+{
+ struct ftrace_stackmap *smap;
+ static bool seed_initialized;
+ int err;
+
+ smap = kzalloc(sizeof(*smap), GFP_KERNEL);
+ if (!smap)
+ return ERR_PTR(-ENOMEM);
+
+ smap->map_bits = stackmap_map_bits;
+ smap->max_elts = 1 << smap->map_bits;
+ smap->map_size = smap->max_elts * 2; /* 2x over-provision */
+
+ smap->entries = vzalloc(sizeof(*smap->entries) * smap->map_size);
+ if (!smap->entries) {
+ kfree(smap);
+ return ERR_PTR(-ENOMEM);
+ }
+
+ err = stackmap_alloc_elts(smap);
+ if (err) {
+ vfree(smap->entries);
+ kfree(smap);
+ return ERR_PTR(err);
+ }
+
+ atomic_set(&smap->next_elt, 0);
+ atomic_set(&smap->resetting, 0);
+ atomic64_set(&smap->hits, 0);
+ atomic64_set(&smap->drops, 0);
+
+ if (!seed_initialized) {
+ stackmap_hash_seed = get_random_u32();
+ seed_initialized = true;
+ }
+
+ return smap;
+}
+
+void ftrace_stackmap_destroy(struct ftrace_stackmap *smap)
+{
+ if (!smap || IS_ERR(smap))
+ return;
+ stackmap_free_elts(smap);
+ vfree(smap->entries);
+ kfree(smap);
+}
+
+void ftrace_stackmap_reset(struct ftrace_stackmap *smap)
+{
+ unsigned int i;
+
+ if (!smap)
+ return;
+
+ /*
+ * Reset protocol:
+ *
+ * 1. Set resetting=1 so get_id() returns -EINVAL immediately.
+ * get_id() callers in NMI/IRQ context will see this and bail
+ * out before touching entries or elts.
+ *
+ * 2. smp_mb() ensures the resetting store is visible to all CPUs
+ * before we start clearing entries. Any get_id() that already
+ * passed the resetting check will complete its cmpxchg and
+ * WRITE_ONCE(entry->val) before we memset, because:
+ * - the cmpxchg claims the slot atomically
+ * - WRITE_ONCE(entry->val) happens before we clear entries
+ * We accept that a handful of in-flight inserts may write into
+ * entries that we are about to clear; those entries will simply
+ * be wiped by the memset below, which is safe.
+ *
+ * 3. Clear entries table, then reset elt pool.
+ *
+ * 4. Clear resetting=0 with another smp_mb() so new get_id()
+ * calls see a fully reset map.
+ */
+ atomic_set(&smap->resetting, 1);
+ smp_mb();
+
+ /* Clear hash table */
+ memset(smap->entries, 0, sizeof(*smap->entries) * smap->map_size);
+
+ /* Reset elt pool */
+ for (i = 0; i < smap->max_elts; i++)
+ memset(smap->elts[i], 0, sizeof(struct stackmap_elt));
+
+ atomic_set(&smap->next_elt, 0);
+ atomic64_set(&smap->hits, 0);
+ atomic64_set(&smap->drops, 0);
+
+ smp_mb();
+ atomic_set(&smap->resetting, 0);
+}
+
+/* --- Core: get_id (lock-free, NMI-safe) --- */
+
+int ftrace_stackmap_get_id(struct ftrace_stackmap *smap,
+ unsigned long *ips, unsigned int nr_entries)
+{
+ u32 key_hash, idx, test_key, trace_len;
+ struct stackmap_entry *entry;
+ struct stackmap_elt *val;
+ int dup_try = 0;
+
+ if (!smap || !nr_entries || atomic_read(&smap->resetting))
+ return -EINVAL;
+ if (nr_entries > FTRACE_STACKMAP_MAX_DEPTH)
+ nr_entries = FTRACE_STACKMAP_MAX_DEPTH;
+
+ trace_len = nr_entries * sizeof(unsigned long);
+ /*
+ * jhash2() requires the length in u32 units and the data to be
+ * u32-aligned. On 64-bit kernels sizeof(unsigned long)==8, so
+ * trace_len is always a multiple of 8 (hence of 4). Use jhash2
+ * directly; the cast to u32* is safe because ips[] is naturally
+ * aligned to sizeof(unsigned long) >= 4.
+ */
+ key_hash = jhash2((const u32 *)ips, trace_len / sizeof(u32),
+ stackmap_hash_seed);
+ if (key_hash == 0)
+ key_hash = 1; /* 0 means free slot */
+
+ idx = key_hash >> (32 - (smap->map_bits + 1));
+
+ while (1) {
+ idx &= (smap->map_size - 1);
+ entry = &smap->entries[idx];
+ test_key = entry->key;
+
+ if (test_key && test_key == key_hash) {
+ val = READ_ONCE(entry->val);
+ if (val && val->nr == nr_entries &&
+ memcmp(val->ips, ips, trace_len) == 0) {
+ atomic_inc(&val->ref_count);
+ atomic64_inc(&smap->hits);
+ return (int)idx;
+ } else if (unlikely(!val)) {
+ /* Another CPU is mid-insert; retry */
+ dup_try++;
+ if (dup_try > smap->map_size) {
+ atomic64_inc(&smap->drops);
+ break;
+ }
+ continue;
+ }
+ }
+
+ if (!test_key) {
+ /* Free slot: try to claim it */
+ if (!cmpxchg(&entry->key, 0, key_hash)) {
+ struct stackmap_elt *elt;
+
+ elt = stackmap_get_elt(smap);
+ if (!elt) {
+ /*
+ * Pool exhausted. We claimed this slot with
+ * cmpxchg but cannot fill it. Leave key set
+ * so the slot stays "claimed but empty" —
+ * future lookups will skip it (val == NULL
+ * triggers the mid-insert retry path which
+ * will eventually drop). This is safer than
+ * writing key=0 without cmpxchg, which could
+ * race with another CPU's cmpxchg on the same
+ * slot.
+ */
+ atomic64_inc(&smap->drops);
+ break;
+ }
+
+ elt->nr = nr_entries;
+ atomic_set(&elt->ref_count, 1);
+ memcpy(elt->ips, ips, trace_len);
+
+ /* Ensure elt is fully visible before publish */
+ smp_wmb();
+ WRITE_ONCE(entry->val, elt);
+ atomic64_inc(&smap->hits);
+ return (int)idx;
+ } else {
+ /* cmpxchg failed; someone else claimed it */
+ dup_try++;
+ continue;
+ }
+ }
+
+ idx++;
+ dup_try++;
+ if (dup_try > smap->map_size) {
+ atomic64_inc(&smap->drops);
+ break;
+ }
+ }
+
+ return -ENOSPC;
+}
+
+/* --- Text export: /sys/kernel/debug/tracing/stack_map --- */
+
+struct stackmap_seq_private {
+ struct ftrace_stackmap *smap;
+};
+
+static void *stackmap_seq_start(struct seq_file *m, loff_t *pos)
+{
+ struct stackmap_seq_private *priv = m->private;
+ struct ftrace_stackmap *smap = priv->smap;
+ u32 i;
+
+ if (!smap)
+ return NULL;
+ for (i = *pos; i < smap->map_size; i++) {
+ if (smap->entries[i].key && smap->entries[i].val) {
+ *pos = i;
+ return &smap->entries[i];
+ }
+ }
+ return NULL;
+}
+
+static void *stackmap_seq_next(struct seq_file *m, void *v, loff_t *pos)
+{
+ struct stackmap_seq_private *priv = m->private;
+ struct ftrace_stackmap *smap = priv->smap;
+ u32 i;
+
+ for (i = *pos + 1; i < smap->map_size; i++) {
+ if (smap->entries[i].key && smap->entries[i].val) {
+ *pos = i;
+ return &smap->entries[i];
+ }
+ }
+ *pos = i;
+ return NULL;
+}
+
+static void stackmap_seq_stop(struct seq_file *m, void *v) { }
+
+static int stackmap_seq_show(struct seq_file *m, void *v)
+{
+ struct stackmap_entry *entry = v;
+ struct stackmap_elt *elt = entry->val;
+ struct stackmap_seq_private *priv = m->private;
+ u32 idx = entry - priv->smap->entries;
+ u32 i;
+
+ if (!elt)
+ return 0;
+
+ seq_printf(m, "stack_id %u [ref %u, depth %u]\n",
+ idx, atomic_read(&elt->ref_count), elt->nr);
+ for (i = 0; i < elt->nr; i++)
+ seq_printf(m, " [%u] %pS\n", i, (void *)elt->ips[i]);
+ seq_putc(m, '\n');
+ return 0;
+}
+
+static const struct seq_operations stackmap_seq_ops = {
+ .start = stackmap_seq_start,
+ .next = stackmap_seq_next,
+ .stop = stackmap_seq_stop,
+ .show = stackmap_seq_show,
+};
+
+static int stackmap_open(struct inode *inode, struct file *file)
+{
+ struct stackmap_seq_private *priv;
+ struct seq_file *m;
+ int ret;
+
+ ret = seq_open_private(file, &stackmap_seq_ops,
+ sizeof(struct stackmap_seq_private));
+ if (ret)
+ return ret;
+ m = file->private_data;
+ priv = m->private;
+ priv->smap = inode->i_private;
+ return 0;
+}
+
+static ssize_t stackmap_write(struct file *file, const char __user *ubuf,
+ size_t count, loff_t *ppos)
+{
+ struct seq_file *m = file->private_data;
+ struct stackmap_seq_private *priv = m->private;
+ char buf[8];
+ size_t n = min(count, sizeof(buf) - 1);
+
+ if (copy_from_user(buf, ubuf, n))
+ return -EFAULT;
+ buf[n] = '\0';
+ if (n == 0 || (buf[0] != '0' && strncmp(buf, "reset", 5) != 0))
+ return -EINVAL;
+
+ ftrace_stackmap_reset(priv->smap);
+ return count;
+}
+
+const struct file_operations ftrace_stackmap_fops = {
+ .open = stackmap_open,
+ .read = seq_read,
+ .write = stackmap_write,
+ .llseek = seq_lseek,
+ .release = seq_release_private,
+};
+
+/* --- Stats --- */
+
+static int stackmap_stat_show(struct seq_file *m, void *v)
+{
+ struct ftrace_stackmap *smap = m->private;
+ u32 entries;
+ u64 hits, drops;
+
+ if (!smap) {
+ seq_puts(m, "stackmap not initialized\n");
+ return 0;
+ }
+
+ entries = atomic_read(&smap->next_elt);
+ hits = atomic64_read(&smap->hits);
+ drops = atomic64_read(&smap->drops);
+
+ seq_printf(m, "entries: %u / %u\n", entries, smap->max_elts);
+ seq_printf(m, "table_size: %u\n", smap->map_size);
+ seq_printf(m, "hits: %llu\n", hits);
+ seq_printf(m, "drops: %llu\n", drops);
+ if (hits + drops > 0)
+ seq_printf(m, "hit_rate: %llu%%\n",
+ hits * 100 / (hits + drops));
+ return 0;
+}
+
+static int stackmap_stat_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, stackmap_stat_show, inode->i_private);
+}
+
+const struct file_operations ftrace_stackmap_stat_fops = {
+ .open = stackmap_stat_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+
+/* --- Binary export --- */
+
+struct stackmap_bin_snapshot {
+ size_t size;
+ char data[];
+};
+
+static int stackmap_bin_open(struct inode *inode, struct file *file)
+{
+ struct ftrace_stackmap *smap = inode->i_private;
+ struct stackmap_bin_snapshot *snap;
+ struct ftrace_stackmap_bin_header *hdr;
+ size_t alloc_size, off;
+ u32 i, nr_stacks;
+
+ if (!smap)
+ return -ENODEV;
+
+ /*
+ * Allocate based on actual entry count, not max_elts worst case.
+ * Each entry needs a header struct plus up to MAX_DEPTH u64 IPs.
+ * Add 1 to nr_entries to avoid zero-size alloc on empty map.
+ */
+ {
+ u32 nr_entries = atomic_read(&smap->next_elt);
+
+ alloc_size = sizeof(*hdr) + (nr_entries + 1) *
+ (sizeof(struct ftrace_stackmap_bin_entry) +
+ FTRACE_STACKMAP_MAX_DEPTH * sizeof(u64));
+ }
+
+ snap = vmalloc(sizeof(*snap) + alloc_size);
+ if (!snap)
+ return -ENOMEM;
+
+ hdr = (struct ftrace_stackmap_bin_header *)snap->data;
+ hdr->magic = FTRACE_STACKMAP_BIN_MAGIC;
+ hdr->version = FTRACE_STACKMAP_BIN_VERSION;
+ hdr->reserved = 0;
+ off = sizeof(*hdr);
+ nr_stacks = 0;
+
+ for (i = 0; i < smap->map_size; i++) {
+ struct stackmap_entry *entry = &smap->entries[i];
+ struct stackmap_elt *elt;
+ struct ftrace_stackmap_bin_entry *e;
+ u64 *ips_out;
+ u32 k;
+
+ if (!entry->key)
+ continue;
+ elt = READ_ONCE(entry->val);
+ if (!elt)
+ continue;
+
+ e = (struct ftrace_stackmap_bin_entry *)(snap->data + off);
+ e->stack_id = i;
+ e->nr = elt->nr;
+ e->ref_count = atomic_read(&elt->ref_count);
+ e->reserved = 0;
+ off += sizeof(*e);
+
+ ips_out = (u64 *)(snap->data + off);
+ for (k = 0; k < elt->nr; k++)
+ ips_out[k] = (u64)elt->ips[k];
+ off += elt->nr * sizeof(u64);
+ nr_stacks++;
+ }
+
+ hdr->nr_stacks = nr_stacks;
+ snap->size = off;
+ file->private_data = snap;
+ return 0;
+}
+
+static ssize_t stackmap_bin_read(struct file *file, char __user *ubuf,
+ size_t count, loff_t *ppos)
+{
+ struct stackmap_bin_snapshot *snap = file->private_data;
+
+ if (!snap)
+ return -EINVAL;
+ return simple_read_from_buffer(ubuf, count, ppos, snap->data, snap->size);
+}
+
+static int stackmap_bin_release(struct inode *inode, struct file *file)
+{
+ vfree(file->private_data);
+ return 0;
+}
+
+const struct file_operations ftrace_stackmap_bin_fops = {
+ .open = stackmap_bin_open,
+ .read = stackmap_bin_read,
+ .llseek = default_llseek,
+ .release = stackmap_bin_release,
+};
diff --git a/kernel/trace/trace_stackmap.h b/kernel/trace/trace_stackmap.h
new file mode 100644
index 000000000000..74ad649a79f7
--- /dev/null
+++ b/kernel/trace/trace_stackmap.h
@@ -0,0 +1,54 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _TRACE_STACKMAP_H
+#define _TRACE_STACKMAP_H
+
+#include <linux/types.h>
+#include <linux/atomic.h>
+
+#define FTRACE_STACKMAP_MAX_DEPTH 64
+
+/* Binary export format */
+#define FTRACE_STACKMAP_BIN_MAGIC 0x464D5342 /* 'FSMB' */
+#define FTRACE_STACKMAP_BIN_VERSION 2
+
+struct ftrace_stackmap_bin_header {
+ u32 magic;
+ u32 version;
+ u32 nr_stacks;
+ u32 reserved;
+};
+
+struct ftrace_stackmap_bin_entry {
+ u32 stack_id;
+ u32 nr;
+ u32 ref_count;
+ u32 reserved;
+ /* followed by u64 ips[nr] */
+};
+
+#ifdef CONFIG_FTRACE_STACKMAP
+
+struct ftrace_stackmap;
+
+struct ftrace_stackmap *ftrace_stackmap_create(void);
+void ftrace_stackmap_destroy(struct ftrace_stackmap *smap);
+int ftrace_stackmap_get_id(struct ftrace_stackmap *smap,
+ unsigned long *ips, unsigned int nr_entries);
+void ftrace_stackmap_reset(struct ftrace_stackmap *smap);
+
+extern const struct file_operations ftrace_stackmap_fops;
+extern const struct file_operations ftrace_stackmap_stat_fops;
+extern const struct file_operations ftrace_stackmap_bin_fops;
+
+#else
+
+struct ftrace_stackmap;
+static inline struct ftrace_stackmap *ftrace_stackmap_create(void) { return NULL; }
+static inline void ftrace_stackmap_destroy(struct ftrace_stackmap *s) { }
+static inline int ftrace_stackmap_get_id(struct ftrace_stackmap *s,
+ unsigned long *ips, unsigned int n)
+{ return -ENOSYS; }
+static inline void ftrace_stackmap_reset(struct ftrace_stackmap *s) { }
+
+#endif
+#endif /* _TRACE_STACKMAP_H */
--
2.34.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [RFC PATCH 2/3] trace: integrate stackmap into ftrace stack recording path
2026-05-14 3:49 [RFC PATCH 0/3] trace: stack trace deduplication for ftrace ring buffer Li Pengfei
2026-05-14 3:49 ` [RFC PATCH 1/3] trace: add lock-free stackmap for stack trace deduplication Li Pengfei
@ 2026-05-14 3:49 ` Li Pengfei
2026-05-14 3:49 ` [RFC PATCH 3/3] trace: add documentation, selftest and tooling for stackmap Li Pengfei
2 siblings, 0 replies; 4+ messages in thread
From: Li Pengfei @ 2026-05-14 3:49 UTC (permalink / raw)
To: linux-trace-kernel
Cc: rostedt, mhiramat, linux-kernel, cmllamas, zhangbo56, lipengfei28
From: Pengfei Li <lipengfei28@xiaomi.com>
Add TRACE_STACK_ID event type and integrate ftrace_stackmap into
__ftrace_trace_stack(). When the 'stackmap' trace option is enabled,
the stack recording path stores a 4-byte stack_id in the ring buffer
instead of the full stack trace.
Changes:
- New TRACE_STACK_ID in trace_type enum
- New stack_id_entry in trace_entries.h (just 'int stack_id')
- New TRACE_ITER_STACKMAP trace option flag
- Modified __ftrace_trace_stack() to call ftrace_stackmap_get_id()
when stackmap option is active
- Added stack_id print handler in trace_output.c
- Added stackmap field to struct trace_array (per-instance support)
The stack_id event is committed unconditionally (no filter check)
since it is a synthetic side-event tied to the parent event which
was already subject to filtering.
Fallback behavior: if stackmap returns an error (pool exhausted or
resetting), the full stack trace is recorded as before.
Usage:
echo 1 > /sys/kernel/debug/tracing/options/stackmap
echo 1 > /sys/kernel/debug/tracing/options/stacktrace
Signed-off-by: Pengfei Li <lipengfei28@xiaomi.com>
---
kernel/trace/trace.c | 46 ++++++++++++++++++++++++++++++++++++
kernel/trace/trace.h | 16 +++++++++++++
kernel/trace/trace_entries.h | 15 ++++++++++++
kernel/trace/trace_output.c | 23 ++++++++++++++++++
4 files changed, 100 insertions(+)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 6eb4d3097a4d..c72cb8491217 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -57,6 +57,7 @@
#include "trace.h"
#include "trace_output.h"
+#include "trace_stackmap.h"
#ifdef CONFIG_FTRACE_STARTUP_TEST
/*
@@ -2184,6 +2185,37 @@ void __ftrace_trace_stack(struct trace_array *tr,
}
#endif
+#ifdef CONFIG_FTRACE_STACKMAP
+ /*
+ * If stackmap dedup is enabled, try to store only the stack_id
+ * in the ring buffer instead of the full stack trace.
+ */
+ if (tr->trace_flags & TRACE_ITER_STACKMAP) {
+ struct stack_id_entry *sid_entry;
+ int sid;
+
+ sid = ftrace_stackmap_get_id(tr->stackmap, fstack->calls, nr_entries);
+ if (sid >= 0) {
+ event = __trace_buffer_lock_reserve(buffer,
+ TRACE_STACK_ID,
+ sizeof(*sid_entry), trace_ctx);
+ if (!event)
+ goto out;
+ sid_entry = ring_buffer_event_data(event);
+ sid_entry->stack_id = sid;
+ /*
+ * stack_id is a synthetic side-event attached to a
+ * primary trace event that was already subject to
+ * filtering. No per-event filter is defined for
+ * TRACE_STACK_ID, so commit unconditionally.
+ */
+ __buffer_unlock_commit(buffer, event);
+ goto out;
+ }
+ /* Fall through to full stack on stackmap failure */
+ }
+#endif
+
event = __trace_buffer_lock_reserve(buffer, TRACE_STACK,
struct_size(entry, caller, nr_entries),
trace_ctx);
@@ -9222,6 +9254,20 @@ static __init void tracer_init_tracefs_work_func(struct work_struct *work)
NULL, &tracing_dyn_info_fops);
#endif
+#ifdef CONFIG_FTRACE_STACKMAP
+ global_trace.stackmap = ftrace_stackmap_create();
+ if (!IS_ERR(global_trace.stackmap)) {
+ trace_create_file("stack_map", TRACE_MODE_WRITE, NULL,
+ global_trace.stackmap, &ftrace_stackmap_fops);
+ trace_create_file("stack_map_stat", TRACE_MODE_READ, NULL,
+ global_trace.stackmap, &ftrace_stackmap_stat_fops);
+ trace_create_file("stack_map_bin", TRACE_MODE_READ, NULL,
+ global_trace.stackmap, &ftrace_stackmap_bin_fops);
+ } else {
+ pr_warn("ftrace stackmap init failed, dedup disabled\n");
+ global_trace.stackmap = NULL;
+ }
+#endif
create_trace_instances(NULL);
update_tracer_options();
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 80fe152af1dd..74f421a89347 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -57,6 +57,7 @@ enum trace_type {
TRACE_TIMERLAT,
TRACE_RAW_DATA,
TRACE_FUNC_REPEATS,
+ TRACE_STACK_ID,
__TRACE_LAST_TYPE,
};
@@ -453,6 +454,9 @@ struct trace_array {
struct cond_snapshot *cond_snapshot;
#endif
struct trace_func_repeats __percpu *last_func_repeats;
+#ifdef CONFIG_FTRACE_STACKMAP
+ struct ftrace_stackmap *stackmap;
+#endif
/*
* On boot up, the ring buffer is set to the minimum size, so that
* we do not waste memory on systems that are not using tracing.
@@ -579,6 +583,8 @@ extern void __ftrace_bad_type(void);
TRACE_GRAPH_RET); \
IF_ASSIGN(var, ent, struct func_repeats_entry, \
TRACE_FUNC_REPEATS); \
+ IF_ASSIGN(var, ent, struct stack_id_entry, \
+ TRACE_STACK_ID); \
__ftrace_bad_type(); \
} while (0)
@@ -1449,7 +1455,16 @@ extern int trace_get_user(struct trace_parser *parser, const char __user *ubuf,
# define STACK_FLAGS
#endif
+#ifdef CONFIG_FTRACE_STACKMAP
+# define STACKMAP_FLAGS \
+ C(STACKMAP, "stackmap"),
+#else
+# define STACKMAP_FLAGS
+# define TRACE_ITER_STACKMAP 0UL
+#endif
+
#ifdef CONFIG_FUNCTION_PROFILER
+
# define PROFILER_FLAGS \
C(PROF_TEXT_OFFSET, "prof-text-offset"),
# ifdef CONFIG_FUNCTION_GRAPH_TRACER
@@ -1506,6 +1521,7 @@ extern int trace_get_user(struct trace_parser *parser, const char __user *ubuf,
FUNCTION_FLAGS \
FGRAPH_FLAGS \
STACK_FLAGS \
+ STACKMAP_FLAGS \
BRANCH_FLAGS \
PROFILER_FLAGS \
FPROFILE_FLAGS
diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h
index 54417468fdeb..89ed14b7e5fd 100644
--- a/kernel/trace/trace_entries.h
+++ b/kernel/trace/trace_entries.h
@@ -250,6 +250,21 @@ FTRACE_ENTRY(user_stack, userstack_entry,
(void *)__entry->caller[6], (void *)__entry->caller[7])
);
+/*
+ * Stack ID entry - stores only a stack_id referencing the stackmap.
+ * Used when CONFIG_FTRACE_STACKMAP is enabled to deduplicate stacks.
+ */
+FTRACE_ENTRY(stack_id, stack_id_entry,
+
+ TRACE_STACK_ID,
+
+ F_STRUCT(
+ __field( int, stack_id )
+ ),
+
+ F_printk("<stack_id %d>", __entry->stack_id)
+);
+
/*
* trace_printk entry:
*/
diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index a5ad76175d10..68678ea88159 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -1517,6 +1517,28 @@ static struct trace_event trace_user_stack_event = {
.funcs = &trace_user_stack_funcs,
};
+/* TRACE_STACK_ID */
+static enum print_line_t trace_stack_id_print(struct trace_iterator *iter,
+ int flags, struct trace_event *event)
+{
+ struct stack_id_entry *field;
+ struct trace_seq *s = &iter->seq;
+
+ trace_assign_type(field, iter->ent);
+ trace_seq_printf(s, "<stack_id %d>\n", field->stack_id);
+
+ return trace_handle_return(s);
+}
+
+static struct trace_event_functions trace_stack_id_funcs = {
+ .trace = trace_stack_id_print,
+};
+
+static struct trace_event trace_stack_id_event = {
+ .type = TRACE_STACK_ID,
+ .funcs = &trace_stack_id_funcs,
+};
+
/* TRACE_HWLAT */
static enum print_line_t
trace_hwlat_print(struct trace_iterator *iter, int flags,
@@ -1908,6 +1930,7 @@ static struct trace_event *events[] __initdata = {
&trace_wake_event,
&trace_stack_event,
&trace_user_stack_event,
+ &trace_stack_id_event,
&trace_bputs_event,
&trace_bprint_event,
&trace_print_event,
--
2.34.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [RFC PATCH 3/3] trace: add documentation, selftest and tooling for stackmap
2026-05-14 3:49 [RFC PATCH 0/3] trace: stack trace deduplication for ftrace ring buffer Li Pengfei
2026-05-14 3:49 ` [RFC PATCH 1/3] trace: add lock-free stackmap for stack trace deduplication Li Pengfei
2026-05-14 3:49 ` [RFC PATCH 2/3] trace: integrate stackmap into ftrace stack recording path Li Pengfei
@ 2026-05-14 3:49 ` Li Pengfei
2 siblings, 0 replies; 4+ messages in thread
From: Li Pengfei @ 2026-05-14 3:49 UTC (permalink / raw)
To: linux-trace-kernel
Cc: rostedt, mhiramat, linux-kernel, cmllamas, zhangbo56, lipengfei28
From: Pengfei Li <lipengfei28@xiaomi.com>
Add supporting files for the ftrace stackmap feature:
Documentation/trace/ftrace-stackmap.rst:
Comprehensive documentation covering design, usage, tracefs
interface, binary format, and performance characteristics.
tools/testing/selftests/ftrace/test.d/ftrace/stackmap-basic.tc:
Basic functional selftest that verifies:
- stackmap tracefs nodes exist
- enabling stackmap + stacktrace produces stack_id events
- stack_map_stat shows non-zero hits
- reset clears entries
tools/tracing/stackmap_dump.py:
Python script to parse the binary stack_map_bin export.
Supports offline symbol resolution via addr2line, JSON output,
and top-N filtering by ref_count.
Signed-off-by: Pengfei Li <lipengfei28@xiaomi.com>
---
Documentation/trace/ftrace-stackmap.rst | 111 ++++++++++++++++
.../ftrace/test.d/ftrace/stackmap-basic.tc | 74 +++++++++++
tools/tracing/stackmap_dump.py | 120 ++++++++++++++++++
3 files changed, 305 insertions(+)
create mode 100644 Documentation/trace/ftrace-stackmap.rst
create mode 100755 tools/testing/selftests/ftrace/test.d/ftrace/stackmap-basic.tc
create mode 100755 tools/tracing/stackmap_dump.py
diff --git a/Documentation/trace/ftrace-stackmap.rst b/Documentation/trace/ftrace-stackmap.rst
new file mode 100644
index 000000000000..8f6410d4258c
--- /dev/null
+++ b/Documentation/trace/ftrace-stackmap.rst
@@ -0,0 +1,111 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+======================
+Ftrace Stack Map
+======================
+
+:Author: Pengfei Li <lipengfei28@xiaomi.com>
+
+Overview
+========
+
+The ftrace stack map provides stack trace deduplication for the ftrace
+ring buffer. When enabled, instead of storing full kernel stack traces
+(typically 80-160 bytes each) in the ring buffer for every event, ftrace
+stores only a 4-byte ``stack_id``. The full stacks are maintained in a
+separate hash table and exported via tracefs for userspace to resolve.
+
+This is inspired by eBPF's ``BPF_MAP_TYPE_STACK_TRACE`` but integrated
+into ftrace's infrastructure, requiring no userspace daemon.
+
+Configuration
+=============
+
+Enable ``CONFIG_FTRACE_STACKMAP=y`` in the kernel config.
+
+Kernel command line parameters:
+
+- ``ftrace_stackmap.bits=N`` - Set map capacity to 2^N unique stacks (default: 14, range: 10-20)
+
+Usage
+=====
+
+Enable stack deduplication::
+
+ echo 1 > /sys/kernel/debug/tracing/options/stackmap
+ echo 1 > /sys/kernel/debug/tracing/options/stacktrace
+ echo function > /sys/kernel/debug/tracing/current_tracer
+
+The trace output will show ``<stack_id N>`` instead of full stack traces::
+
+ sh-1234 [006] d.h.. 123.456789: <stack_id 42>
+
+To view the actual stacks::
+
+ cat /sys/kernel/debug/tracing/stack_map
+
+Output format::
+
+ stack_id 42 [ref 1337, depth 8]
+ [0] schedule+0x48/0xc0
+ [1] schedule_timeout+0x1c/0x30
+ ...
+
+To view statistics::
+
+ cat /sys/kernel/debug/tracing/stack_map_stat
+
+Output::
+
+ entries: 2500
+ table_size: 5000
+ hits: 148923
+ drops: 0
+ hit_rate: 98%
+
+To reset the stack map::
+
+ echo 0 > /sys/kernel/debug/tracing/stack_map
+
+Tracefs Nodes
+=============
+
+``stack_map``
+ Text export of all deduplicated stacks with symbol resolution.
+ Writing ``0`` or ``reset`` clears all entries.
+
+``stack_map_stat``
+ Statistics: entry count, hits, drops, and hit rate.
+
+``stack_map_bin``
+ Binary export for efficient userspace consumption. Format:
+
+ - Header (16 bytes): magic(u32) + version(u32) + nr_stacks(u32) + reserved(u32)
+ - Per stack: stack_id(u32) + nr(u32) + ref_count(u32) + reserved(u32) + ips(u64 × nr)
+
+ Magic: ``0x464D5342`` ('FSMB'), Version: 2
+
+Design
+======
+
+The stack map is modeled after ``tracing_map.c`` (used by hist triggers),
+using a lock-free design based on Dr. Cliff Click's non-blocking hash table
+algorithm:
+
+- **Lookup/Insert**: Lock-free via ``cmpxchg``, safe in NMI/IRQ/any context
+- **Memory**: Pre-allocated element pool, zero allocation on the hot path
+ (no GFP_ATOMIC failures under memory pressure)
+- **Collision**: Linear probing with a 2x over-provisioned table
+- **Per-instance**: Each trace_array has its own stackmap, supporting
+ multiple ftrace instances
+- **Hash**: 32-bit jhash of stack IPs; full ``memcmp`` confirms matches
+
+Performance
+===========
+
+Typical results on ARM64 Android device (function tracer, 2 seconds):
+
+- Unique stacks: ~3000
+- Hit rate: 84-98% (depends on workload diversity)
+- Ring buffer savings: ~80% for stack data
+- Overhead per event: ~50ns (one jhash + hash table lookup)
diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/stackmap-basic.tc b/tools/testing/selftests/ftrace/test.d/ftrace/stackmap-basic.tc
new file mode 100755
index 000000000000..3b0a7f60769f
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/ftrace/stackmap-basic.tc
@@ -0,0 +1,74 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: ftrace - stackmap basic functionality
+# requires: stack_map options/stackmap
+
+# Test that ftrace stackmap deduplication works:
+# 1. Enable stackmap + stacktrace options
+# 2. Run function tracer briefly
+# 3. Verify stack_map has entries
+# 4. Verify stack_map_stat shows hits
+# 5. Verify trace contains <stack_id> events
+# 6. Verify reset works
+
+fail() {
+ echo "FAIL: $1"
+ exit_fail
+}
+
+disable_tracing
+clear_trace
+
+# Verify stackmap files exist
+test -f stack_map || fail "stack_map file missing"
+test -f stack_map_stat || fail "stack_map_stat file missing"
+test -f stack_map_bin || fail "stack_map_bin file missing"
+
+# Enable stackmap dedup
+echo 1 > options/stackmap
+echo 1 > options/stacktrace
+
+# Run function tracer briefly
+echo function > current_tracer
+enable_tracing
+sleep 1
+disable_tracing
+echo nop > current_tracer
+echo 0 > options/stackmap
+
+# Check stack_map_stat has entries
+entries=$(cat stack_map_stat | grep "^entries:" | awk '{print $2}')
+if [ "$entries" -eq 0 ]; then
+ fail "stackmap has zero entries after tracing"
+fi
+
+# Check hits > 0
+hits=$(cat stack_map_stat | grep "^hits:" | awk '{print $2}')
+if [ "$hits" -eq 0 ]; then
+ fail "stackmap has zero hits"
+fi
+
+# Check drops == 0 (pool should be large enough for 1s trace)
+drops=$(cat stack_map_stat | grep "^drops:" | awk '{print $2}')
+
+# Check stack_map text output is parseable
+first_id=$(cat stack_map | grep "^stack_id" | head -1 | awk '{print $2}')
+if [ -z "$first_id" ]; then
+ fail "stack_map output has no stack_id entries"
+fi
+
+# Check trace has stack_id events
+count=$(cat trace | grep -c "stack_id" || true)
+if [ "$count" -eq 0 ]; then
+ fail "trace has no <stack_id> events"
+fi
+
+# Test reset
+echo 0 > stack_map
+entries_after=$(cat stack_map_stat | grep "^entries:" | awk '{print $2}')
+if [ "$entries_after" -ne 0 ]; then
+ fail "stackmap reset did not clear entries"
+fi
+
+echo "stackmap basic test passed: $entries unique stacks, $hits hits, $drops drops"
+exit 0
diff --git a/tools/tracing/stackmap_dump.py b/tools/tracing/stackmap_dump.py
new file mode 100755
index 000000000000..91ce80c681ea
--- /dev/null
+++ b/tools/tracing/stackmap_dump.py
@@ -0,0 +1,120 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+"""
+stackmap_dump.py - Parse and display ftrace stack_map_bin binary export.
+
+Usage:
+ # Pull from device and parse
+ adb pull /sys/kernel/debug/tracing/stack_map_bin /tmp/stack_map.bin
+ python3 stackmap_dump.py /tmp/stack_map.bin
+
+ # With vmlinux for offline symbol resolution
+ python3 stackmap_dump.py /tmp/stack_map.bin --vmlinux vmlinux
+
+ # JSON output for tooling
+ python3 stackmap_dump.py /tmp/stack_map.bin --json
+"""
+
+import struct
+import sys
+import argparse
+import json
+import subprocess
+
+MAGIC = 0x464D5342 # 'FSMB'
+HEADER_FMT = '<IIII' # magic, version, nr_stacks, reserved
+ENTRY_FMT = '<IIII' # stack_id, nr, ref_count, reserved
+HEADER_SIZE = struct.calcsize(HEADER_FMT)
+ENTRY_SIZE = struct.calcsize(ENTRY_FMT)
+
+
+def addr2line(vmlinux, addr):
+ """Resolve address to symbol using addr2line."""
+ try:
+ result = subprocess.run(
+ ['addr2line', '-f', '-e', vmlinux, hex(addr)],
+ capture_output=True, text=True, timeout=5
+ )
+ lines = result.stdout.strip().split('\n')
+ if len(lines) >= 1 and lines[0] != '??':
+ return lines[0]
+ except (subprocess.TimeoutExpired, FileNotFoundError):
+ pass
+ return None
+
+
+def parse_stackmap_bin(data):
+ """Parse binary stackmap data, yield (stack_id, ref_count, [ips])."""
+ if len(data) < HEADER_SIZE:
+ raise ValueError("File too small for header")
+
+ magic, version, nr_stacks, _ = struct.unpack_from(HEADER_FMT, data, 0)
+ if magic != MAGIC:
+ raise ValueError(f"Bad magic: 0x{magic:08x}, expected 0x{MAGIC:08x}")
+ if version not in (1, 2):
+ raise ValueError(f"Unsupported version: {version}")
+
+ offset = HEADER_SIZE
+ for _ in range(nr_stacks):
+ if offset + ENTRY_SIZE > len(data):
+ break
+ stack_id, nr, ref_count, _ = struct.unpack_from(ENTRY_FMT, data, offset)
+ offset += ENTRY_SIZE
+
+ ips_size = nr * 8
+ if offset + ips_size > len(data):
+ break
+ ips = struct.unpack_from(f'<{nr}Q', data, offset)
+ offset += ips_size
+
+ yield stack_id, ref_count, list(ips)
+
+
+def main():
+ parser = argparse.ArgumentParser(description='Parse ftrace stack_map_bin')
+ parser.add_argument('file', help='Path to stack_map_bin file')
+ parser.add_argument('--vmlinux', help='Path to vmlinux for symbol resolution')
+ parser.add_argument('--json', action='store_true', help='JSON output')
+ parser.add_argument('--top', type=int, default=0,
+ help='Show only top N stacks by ref_count')
+ args = parser.parse_args()
+
+ with open(args.file, 'rb') as f:
+ data = f.read()
+
+ stacks = list(parse_stackmap_bin(data))
+
+ if args.top > 0:
+ stacks.sort(key=lambda x: x[1], reverse=True)
+ stacks = stacks[:args.top]
+
+ if args.json:
+ output = []
+ for stack_id, ref_count, ips in stacks:
+ entry = {
+ 'stack_id': stack_id,
+ 'ref_count': ref_count,
+ 'ips': [f'0x{ip:x}' for ip in ips]
+ }
+ if args.vmlinux:
+ entry['symbols'] = [addr2line(args.vmlinux, ip) or f'0x{ip:x}'
+ for ip in ips]
+ output.append(entry)
+ print(json.dumps(output, indent=2))
+ else:
+ for stack_id, ref_count, ips in stacks:
+ print(f"stack_id {stack_id} [ref {ref_count}, depth {len(ips)}]")
+ for i, ip in enumerate(ips):
+ sym = ''
+ if args.vmlinux:
+ resolved = addr2line(args.vmlinux, ip)
+ if resolved:
+ sym = f' {resolved}'
+ print(f" [{i}] 0x{ip:x}{sym}")
+ print()
+
+ print(f"Total: {len(stacks)} unique stacks", file=sys.stderr)
+
+
+if __name__ == '__main__':
+ main()
--
2.34.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-05-14 3:52 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-14 3:49 [RFC PATCH 0/3] trace: stack trace deduplication for ftrace ring buffer Li Pengfei
2026-05-14 3:49 ` [RFC PATCH 1/3] trace: add lock-free stackmap for stack trace deduplication Li Pengfei
2026-05-14 3:49 ` [RFC PATCH 2/3] trace: integrate stackmap into ftrace stack recording path Li Pengfei
2026-05-14 3:49 ` [RFC PATCH 3/3] trace: add documentation, selftest and tooling for stackmap Li Pengfei
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox