* [PATCH v3 0/2] mm/kmemleak: dedupe verbose scan output
@ 2026-05-06 12:58 Breno Leitao
2026-05-06 12:58 ` [PATCH v3 1/2] mm/kmemleak: dedupe verbose scan output by allocation backtrace Breno Leitao
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Breno Leitao @ 2026-05-06 12:58 UTC (permalink / raw)
To: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Shuah Khan, Catalin Marinas, Liam R. Howlett, Liam R. Howlett
Cc: linux-kernel, linux-mm, linux-kselftest, kernel-team,
Breno Leitao
In this version, I am not touching the kernel code, but, just fixing
the selftest, as sashiko reported some issues, and I am addressing them
in here.
https://sashiko.dev/#/patchset/20260424-kmemleak_dedup-v2-0-8bea649b2a92%40debian.org
NOTE: Additional testing has revealed that lockdep detects a potential lock
inversion issue on kmemleak and legacy console.
The problem occurs because the function holds &object->lock (a raw spinlock)
while invoking printk to report the leak. This printk requires the legacy
console_owner lock, which remains in active use despite the ongoing transition
to the nbcon framework.
Concurrently, console drivers such as hvc (hypervisor virtual console) acquire
the console_owner lock and may subsequently free memory. This kfree() operation
calls into kmemleak's __delete_object(), which in turn acquires object->lock.
Although these code paths hold locks from different object instances, lockdep
operates on lock classes rather than individual instances. Since all
kmemleak_object->lock instances belong to the same lock class, lockdep
identifies this as a circular dependency, even though the actual deadlock
scenario cannot occur in practice (?!).
This problem was not introduced by this patchset/selftest, but, it is exposing
it. I plan to solve it once this patchset is done.
Breno Leitao (2):
mm/kmemleak: dedupe verbose scan output by allocation backtrace
selftests/mm: add kmemleak verbose dedup test
mm/kmemleak.c | 102 +++++++++++++++++-
.../selftests/mm/test_kmemleak_dedup.sh | 78 ++++++++++++++
2 files changed, 175 insertions(+), 5 deletions(-)
create mode 100755 tools/testing/selftests/mm/test_kmemleak_dedup.sh
--
2.52.0
Signed-off-by: Breno Leitao <leitao@debian.org>
---
Changes in v3:
- No changes to the first patch (Kernel changes). All changes below are
for the selftest.
- Pre-cleanup before modprobe — rmmod "$MODULE" 2>/dev/null added before
modprobe so a stale load doesn't cause modprobe to be a no-op.
- dmesg -C between the two scans — isolates printed count to the second
(reporting) scan only.
- Link to v2: https://patch.msgid.link/20260424-kmemleak_dedup-v2-0-8bea649b2a92@debian.org
Changes in v2:
- Drop struct kmemleak_dedup_entry and its kmalloc. (Catalin)
- Handle trace_handle == 0 instead of dropping it.
- Skip hex dump for coalesced entries (dup_count > 1) — bytes would differ
across objects sharing a trace anyway, and it removes the only
object->pointer read left in the deferred path.
- Counter narrowed from unsigned long count to unsigned int dup_count.
- Link to v1: https://patch.msgid.link/20260421-kmemleak_dedup-v1-0-65e31c6cdf0c@debian.org
---
Breno Leitao (2):
mm/kmemleak: dedupe verbose scan output by allocation backtrace
selftests/mm: add kmemleak verbose dedup test
mm/kmemleak.c | 148 ++++++++++++++-
tools/testing/selftests/mm/Makefile | 1 +
tools/testing/selftests/mm/ksft_kmemleak_dedup.sh | 222 ++++++++++++++++++++++
3 files changed, 363 insertions(+), 8 deletions(-)
---
base-commit: 4cd074ae20bbcc293bbbce9163abe99d68ae6ae0
change-id: 20260420-kmemleak_dedup-bee54ffa65e7
Best regards,
--
Breno Leitao <leitao@debian.org>
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v3 1/2] mm/kmemleak: dedupe verbose scan output by allocation backtrace
2026-05-06 12:58 [PATCH v3 0/2] mm/kmemleak: dedupe verbose scan output Breno Leitao
@ 2026-05-06 12:58 ` Breno Leitao
2026-05-06 12:58 ` [PATCH v3 2/2] selftests/mm: add kmemleak verbose dedup test Breno Leitao
2026-05-08 22:17 ` [PATCH v3 0/2] mm/kmemleak: dedupe verbose scan output Andrew Morton
2 siblings, 0 replies; 4+ messages in thread
From: Breno Leitao @ 2026-05-06 12:58 UTC (permalink / raw)
To: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Shuah Khan, Catalin Marinas, Liam R. Howlett, Liam R. Howlett
Cc: linux-kernel, linux-mm, linux-kselftest, kernel-team,
Breno Leitao
In kmemleak's verbose mode, every unreferenced object found during a
scan is logged with its full header, hex dump and 16-frame backtrace.
Workloads that leak many objects from a single allocation site flood
dmesg with byte-for-byte identical backtraces, drowning out distinct
leaks and other kernel messages.
Dedupe within each scan using stackdepot's trace_handle as the key: for
every leaked object with a recorded stack trace, look up the
representative kmemleak_object in a per-scan xarray keyed by
trace_handle. The first sighting stores the object pointer (with a
get_object() reference) and sets object->dup_count to 1; later
sightings just bump dup_count on the representative. After the scan,
walk the xarray once and emit each unique backtrace, followed by a
single summary line when more than one object shares it.
Leaks whose trace_handle is 0 (early-boot allocations tracked before
kmemleak_init() set up object_cache, or stack_depot_save() failures
under memory pressure) cannot be deduped, so they are still printed
inline via the same locked OBJECT_ALLOCATED-checked helper. The
contents of /sys/kernel/debug/kmemleak are unchanged - only the
verbose console output is collapsed.
Safety notes:
- The xarray store happens outside object->lock: object->lock is a
raw spinlock, while xa_store() may grab xa_node slab locks at a
higher wait-context level which lockdep flags as invalid.
trace_handle is captured under object->lock (which serialises with
kmemleak_update_trace()'s writer), so it is safe to use after
dropping the lock.
- get_object() pins the kmemleak_object metadata across
rcu_read_unlock(), but the underlying tracked allocation can still
be freed concurrently. The deferred print path therefore re-acquires
object->lock and re-checks OBJECT_ALLOCATED via print_leak_locked()
before touching object->pointer; __delete_object() clears that flag
under the same lock before the user memory goes away. The same
helper is used by the trace_handle == 0 and xa_store() failure
fallbacks, so every printer in the new path has identical safety
guarantees.
- If get_object() fails after we set OBJECT_REPORTED, the object is
already being torn down (use_count hit zero); the leak count is
still accurate but the verbose line is dropped, which is correct
- the memory was freed concurrently and is no longer a leak.
- If xa_store() fails to allocate an xa_node under memory pressure,
we fall back to printing inline via print_leak_locked() instead of
silently dropping the leak.
- The hex dump is skipped for coalesced entries (dup_count > 1):
bytes would differ across objects sharing a backtrace anyway, and
skipping it removes the only remaining read of object->pointer's
contents in the deferred path. The representative's reported size
may also differ from the coalesced objects' sizes; the printed
trace_handle reflects the representative's current value rather
than the value used as the dedup key, which is normally - but not
strictly - identical.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
---
mm/kmemleak.c | 148 ++++++++++++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 140 insertions(+), 8 deletions(-)
diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 2eff0d6b622b6..7c7ba17ce7af0 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -92,6 +92,7 @@
#include <linux/nodemask.h>
#include <linux/mm.h>
#include <linux/workqueue.h>
+#include <linux/xarray.h>
#include <linux/crc32.h>
#include <asm/sections.h>
@@ -157,6 +158,8 @@ struct kmemleak_object {
struct hlist_head area_list;
unsigned long jiffies; /* creation timestamp */
pid_t pid; /* pid of the current task */
+ /* per-scan dedup count, valid only while in scan-local dedup xarray */
+ unsigned int dup_count;
char comm[TASK_COMM_LEN]; /* executable name */
};
@@ -360,8 +363,9 @@ static const char *__object_type_str(struct kmemleak_object *object)
* Printing of the unreferenced objects information to the seq file. The
* print_unreferenced function must be called with the object->lock held.
*/
-static void print_unreferenced(struct seq_file *seq,
- struct kmemleak_object *object)
+static void __print_unreferenced(struct seq_file *seq,
+ struct kmemleak_object *object,
+ bool hex_dump)
{
int i;
unsigned long *entries;
@@ -373,7 +377,8 @@ static void print_unreferenced(struct seq_file *seq,
object->pointer, object->size);
warn_or_seq_printf(seq, " comm \"%s\", pid %d, jiffies %lu\n",
object->comm, object->pid, object->jiffies);
- hex_dump_object(seq, object);
+ if (hex_dump)
+ hex_dump_object(seq, object);
warn_or_seq_printf(seq, " backtrace (crc %x):\n", object->checksum);
for (i = 0; i < nr_entries; i++) {
@@ -382,6 +387,12 @@ static void print_unreferenced(struct seq_file *seq,
}
}
+static void print_unreferenced(struct seq_file *seq,
+ struct kmemleak_object *object)
+{
+ __print_unreferenced(seq, object, true);
+}
+
/*
* Print the kmemleak_object information. This function is used mainly for
* debugging special cases when kmemleak operations. It must be called with
@@ -1684,6 +1695,103 @@ static void kmemleak_cond_resched(struct kmemleak_object *object)
put_object(object);
}
+/*
+ * Print one leak inline. The hex dump is gated on OBJECT_ALLOCATED so it
+ * does not touch user memory that was freed concurrently; the rest of the
+ * report (backtrace, comm, pid) is always emitted since the kmemleak_object
+ * metadata is pinned by the caller.
+ */
+static void print_leak_locked(struct kmemleak_object *object, bool hex_dump)
+{
+ raw_spin_lock_irq(&object->lock);
+ __print_unreferenced(NULL, object,
+ hex_dump && (object->flags & OBJECT_ALLOCATED));
+ raw_spin_unlock_irq(&object->lock);
+}
+
+/*
+ * Per-scan dedup table for verbose leak printing. The xarray is keyed by
+ * stackdepot trace_handle and stores a pointer to the representative
+ * kmemleak_object. The per-scan repeat count lives in object->dup_count.
+ *
+ * dedup_record() must run outside object->lock: xa_store() may take
+ * mutexes (xa_node slab allocation) which lockdep would flag against the
+ * raw spinlock object->lock.
+ */
+static void dedup_record(struct xarray *dedup, struct kmemleak_object *object,
+ depot_stack_handle_t trace_handle)
+{
+ struct kmemleak_object *rep;
+ void *old;
+
+ /*
+ * No stack trace to dedup against: early-boot allocation tracked
+ * before kmemleak_init() set up object_cache, or stack_depot_save()
+ * failure under memory pressure.
+ */
+ if (!trace_handle) {
+ print_leak_locked(object, true);
+ return;
+ }
+
+ /* stack is available, now we can de-dup */
+ rep = xa_load(dedup, trace_handle);
+ if (rep) {
+ rep->dup_count++;
+ return;
+ }
+
+ /*
+ * Object is being torn down (use_count already hit zero); the
+ * tracked memory at object->pointer is unsafe to read, so skip.
+ */
+ if (!get_object(object))
+ return;
+
+ object->dup_count = 1;
+ old = xa_store(dedup, trace_handle, object, GFP_ATOMIC);
+ if (xa_is_err(old)) {
+ /* xa_node allocation failed; fall back to inline print. */
+ print_leak_locked(object, true);
+ put_object(object);
+ return;
+ }
+ /*
+ * scan_mutex serialises all writers to the dedup xarray, so xa_store()
+ * after a NULL xa_load() must always overwrite an empty slot.
+ */
+ WARN_ON_ONCE(old);
+}
+
+/*
+ * Drain the dedup table. Re-acquires object->lock and re-checks
+ * OBJECT_ALLOCATED before printing: while get_object() pins the
+ * kmemleak_object metadata, the underlying tracked allocation may have
+ * been freed since the scan walked it (kmemleak_free clears
+ * OBJECT_ALLOCATED under object->lock before the user memory goes away).
+ * The hex dump is skipped for coalesced entries since the bytes would
+ * differ across objects anyway.
+ */
+static void dedup_flush(struct xarray *dedup)
+{
+ struct kmemleak_object *object;
+ unsigned long idx;
+ unsigned int dup;
+ bool coalesced;
+
+ xa_for_each(dedup, idx, object) {
+ dup = object->dup_count;
+ coalesced = dup > 1;
+
+ print_leak_locked(object, !coalesced);
+ if (coalesced)
+ pr_warn(" ... and %u more object(s) with the same backtrace\n",
+ dup - 1);
+ put_object(object);
+ xa_erase(dedup, idx);
+ }
+}
+
/*
* Scan data sections and all the referenced memory blocks allocated via the
* kernel's standard allocators. This function must be called with the
@@ -1694,6 +1802,7 @@ static void kmemleak_scan(void)
struct kmemleak_object *object;
struct zone *zone;
int __maybe_unused i;
+ struct xarray dedup;
int new_leaks = 0;
jiffies_last_scan = jiffies;
@@ -1834,10 +1943,18 @@ static void kmemleak_scan(void)
return;
/*
- * Scanning result reporting.
+ * Scanning result reporting. When verbose printing is enabled, dedupe
+ * by stackdepot trace_handle so each unique backtrace is logged once
+ * per scan, annotated with the number of objects that share it. The
+ * per-leak count below still reflects every object, and
+ * /sys/kernel/debug/kmemleak still lists them individually.
*/
+ xa_init(&dedup);
rcu_read_lock();
list_for_each_entry_rcu(object, &object_list, object_list) {
+ depot_stack_handle_t trace_handle;
+ bool dedup_print;
+
if (need_resched())
kmemleak_cond_resched(object);
@@ -1849,18 +1966,33 @@ static void kmemleak_scan(void)
if (!color_white(object))
continue;
raw_spin_lock_irq(&object->lock);
+ trace_handle = 0;
+ dedup_print = false;
if (unreferenced_object(object) &&
!(object->flags & OBJECT_REPORTED)) {
object->flags |= OBJECT_REPORTED;
-
- if (kmemleak_verbose)
- print_unreferenced(NULL, object);
-
+ if (kmemleak_verbose) {
+ trace_handle = object->trace_handle;
+ dedup_print = true;
+ }
new_leaks++;
}
raw_spin_unlock_irq(&object->lock);
+
+ /*
+ * Defer the verbose print outside object->lock: xa_store()
+ * may take xa_node slab locks at a higher wait-context level
+ * which lockdep would flag against the raw_spinlock_t
+ * object->lock. rcu_read_lock() keeps the kmemleak_object
+ * alive across the call.
+ */
+ if (dedup_print)
+ dedup_record(&dedup, object, trace_handle);
}
rcu_read_unlock();
+ /* Flush'em all */
+ dedup_flush(&dedup);
+ xa_destroy(&dedup);
if (new_leaks) {
kmemleak_found_leaks = true;
--
2.52.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH v3 2/2] selftests/mm: add kmemleak verbose dedup test
2026-05-06 12:58 [PATCH v3 0/2] mm/kmemleak: dedupe verbose scan output Breno Leitao
2026-05-06 12:58 ` [PATCH v3 1/2] mm/kmemleak: dedupe verbose scan output by allocation backtrace Breno Leitao
@ 2026-05-06 12:58 ` Breno Leitao
2026-05-08 22:17 ` [PATCH v3 0/2] mm/kmemleak: dedupe verbose scan output Andrew Morton
2 siblings, 0 replies; 4+ messages in thread
From: Breno Leitao @ 2026-05-06 12:58 UTC (permalink / raw)
To: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Shuah Khan, Catalin Marinas, Liam R. Howlett, Liam R. Howlett
Cc: linux-kernel, linux-mm, linux-kselftest, kernel-team,
Breno Leitao
Add a regression test for the per-scan verbose dedup added in the
preceding commit. The test loads samples/kmemleak's helper module
(CONFIG_SAMPLE_KMEMLEAK=m) to generate orphan allocations, several of
which share an allocation backtrace, runs four kmemleak scans with
verbose printing enabled, then walks dmesg looking for two
"unreferenced object" reports within a single scan that share an
identical backtrace - which would mean dedup failed to collapse them.
The test is intentionally permissive on detection but strict on
regressions:
- PASS when no duplicates are observed, regardless of whether the
dedup summary line ("... and N more object(s) with the same
backtrace") was actually emitted. Per-CPU chunk reuse, slab
freelist pointers, kernel stack residue and CONFIG_DEBUG_KMEMLEAK_
AUTO_SCAN can all keep most of the orphans "still referenced" or
reported across many separate scans, so the dedup path may have
nothing to fold within one scan. That is not a regression.
- PASS reports whether dedup actually fired, so a passing run on a
well-behaved environment is still informative.
- FAIL when two same-backtrace reports land in a single scan (clear
dedup regression).
- FAIL when kmemleak's own per-scan tally counts leaks but the
verbose path emits zero "unreferenced object" lines - that catches
a regression in the verbose printer itself, which would otherwise
pass the duplicate check trivially.
- SKIP when kmemleak is absent, disabled at runtime, or the helper
module is not built.
The dmesg parser anchors stack-frame matching to the indentation
kmemleak uses for them (4+ spaces under "kmemleak: ") so unrelated
kmemleak warnings landing between reports do not get lumped into the
backtrace key and mask a duplicate.
Signed-off-by: Breno Leitao <leitao@debian.org>
---
tools/testing/selftests/mm/Makefile | 1 +
tools/testing/selftests/mm/ksft_kmemleak_dedup.sh | 222 ++++++++++++++++++++++
2 files changed, 223 insertions(+)
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index 18779045b7f69..41053fdaad88d 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -151,6 +151,7 @@ TEST_PROGS += ksft_gup_test.sh
TEST_PROGS += ksft_hmm.sh
TEST_PROGS += ksft_hugetlb.sh
TEST_PROGS += ksft_hugevm.sh
+TEST_PROGS += ksft_kmemleak_dedup.sh
TEST_PROGS += ksft_ksm.sh
TEST_PROGS += ksft_ksm_numa.sh
TEST_PROGS += ksft_madv_guard.sh
diff --git a/tools/testing/selftests/mm/ksft_kmemleak_dedup.sh b/tools/testing/selftests/mm/ksft_kmemleak_dedup.sh
new file mode 100755
index 0000000000000..d019502444901
--- /dev/null
+++ b/tools/testing/selftests/mm/ksft_kmemleak_dedup.sh
@@ -0,0 +1,222 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Regression test for kmemleak's per-scan verbose dedup.
+#
+# Loads samples/kmemleak's helper module to generate orphan allocations
+# (some of which share an allocation backtrace), runs a few kmemleak
+# scans with verbose printing enabled, and verifies that no two
+# "unreferenced object" reports within a single scan share the same
+# backtrace - which would mean dedup failed to collapse them.
+#
+# This test is intentionally permissive: the kmemleak-test module's
+# leaks frequently get reported across many separate scans (per-CPU
+# chunk reuse, slab freelist pointers, kernel stack residue), so dedup
+# may never have anything to fold within one scan. That is not a
+# regression. The test only fails when it actually catches dedup not
+# happening on input that should have triggered it - i.e. two reports
+# with identical backtraces in the same scan.
+#
+# Author: Breno Leitao <leitao@debian.org>
+
+ksft_skip=4
+KMEMLEAK=/sys/kernel/debug/kmemleak
+VERBOSE_PARAM=/sys/module/kmemleak/parameters/verbose
+MODULE=kmemleak-test
+
+skip() {
+ echo "SKIP: $*"
+ exit $ksft_skip
+}
+
+fail() {
+ echo "FAIL: $*"
+ exit 1
+}
+
+pass() {
+ echo "PASS: $*"
+ exit 0
+}
+
+[ "$(id -u)" -eq 0 ] || skip "must run as root"
+[ -r "$KMEMLEAK" ] || skip "no kmemleak debugfs (CONFIG_DEBUG_KMEMLEAK)"
+[ -w "$VERBOSE_PARAM" ] || skip "kmemleak verbose param missing"
+modinfo "$MODULE" >/dev/null 2>&1 ||
+ skip "$MODULE not built (CONFIG_SAMPLE_KMEMLEAK)"
+
+# The verdict depends entirely on dmesg contents, so a silently-empty
+# dmesg (dmesg_restrict=1 with CAP_SYSLOG dropped, restricted container,
+# etc.) would let the script report PASS without parsing anything. Probe
+# both read and clear up front and skip cleanly if either is denied.
+dmesg >/dev/null 2>&1 ||
+ skip "cannot read dmesg (need CAP_SYSLOG or dmesg_restrict=0)"
+dmesg -C >/dev/null 2>&1 ||
+ skip "cannot clear dmesg (need CAP_SYSLOG or dmesg_restrict=0)"
+
+# kmemleak can be present but disabled at runtime (boot arg kmemleak=off,
+# or it self-disabled after an internal error). In that state writes other
+# than "clear" return EPERM, so probe once and skip if so.
+if ! echo scan > "$KMEMLEAK" 2>/dev/null; then
+ skip "kmemleak is disabled (check dmesg or kmemleak= boot arg)"
+fi
+
+prev_verbose=$(cat "$VERBOSE_PARAM")
+# shellcheck disable=SC2317 # invoked indirectly via trap
+cleanup() {
+ echo "$prev_verbose" > "$VERBOSE_PARAM" 2>/dev/null
+ rmmod "$MODULE" 2>/dev/null
+ # Drain the leak set we generated. Subsequent selftests (e.g.
+ # tools/testing/selftests/net/netfilter/nft_interface_stress.sh)
+ # fail on any non-empty kmemleak report, so leaving the helper
+ # module's intentional leaks behind would poison the rest of a
+ # kselftest run.
+ #
+ # Caveat: kmemleak_clear() only greys objects that have already
+ # been reported (OBJECT_REPORTED && unreferenced_object()). Helper
+ # allocations that stayed "still referenced" throughout the test
+ # (stale pointers in per-CPU chunks, slab freelists, kernel stacks)
+ # were never reported and are therefore not greyed by this clear -
+ # they remain tracked and a later scan can still surface them. Such
+ # leftovers are inherent to the kmemleak-test sample module and are
+ # not specific to this test; consumers that fail on any kmemleak
+ # output (rather than on the test-specific backtraces) need to be
+ # robust to that, or this test should be excluded from the run.
+ echo clear > "$KMEMLEAK" 2>/dev/null
+}
+trap cleanup EXIT
+
+echo 1 > "$VERBOSE_PARAM"
+
+# Drain the existing leak set so the next scan only reports our objects.
+echo clear > "$KMEMLEAK"
+
+# Re-clear dmesg now (the up-front probe also cleared it, but anything
+# logged between then and here - module unload chatter, the probe scan,
+# the verbose-param write - would otherwise pollute the parse window).
+dmesg -C >/dev/null
+
+# If the module was left loaded by a previous aborted run, modprobe would
+# be a no-op and the init function would not run, so no new leaks would be
+# generated. Force a clean state first.
+rmmod "$MODULE" 2>/dev/null
+modprobe "$MODULE" || skip "failed to load $MODULE"
+# Removing the module orphans the list elements without freeing them.
+rmmod "$MODULE" || skip "failed to unload $MODULE"
+
+# Run a handful of scans so kmemleak has the chance to age and report
+# the orphans. We do not require any particular number to be reported:
+# the regression check below operates on whatever lands in dmesg.
+#
+# Note: with CONFIG_DEBUG_KMEMLEAK_AUTO_SCAN=y the kernel's own scan
+# thread can report and mark these orphans (OBJECT_REPORTED) before our
+# manual scans run, after which our scans will see nothing. The
+# lower-bound check below catches the case where that happens and the
+# manual scans also produce nothing.
+SCAN_COUNT=4
+SCAN_SLEEP=6
+for _ in $(seq 1 "$SCAN_COUNT"); do
+ echo scan > "$KMEMLEAK"
+ sleep "$SCAN_SLEEP"
+done
+
+# Strip the leading "[ nnn.nnnnnn] " dmesg timestamp prefix. Without
+# this, two identical stack frames printed from two reports in the same
+# scan would produce different per-frame strings (different timestamps)
+# and the duplicate-backtrace check below would not match them, silently
+# passing a real dedup regression. Doing the strip here makes the rest
+# of the parser timestamp-agnostic regardless of what dmesg defaults to.
+log=$(dmesg | sed 's/^\[[^]]*\] //')
+
+# After running the workload (modprobe + scans), dmesg should contain at
+# least the helper module's pr_info lines and our manual-scan output. An
+# empty capture here means dmesg succeeded earlier but is now denying us
+# the buffer (race with dmesg_restrict toggling, etc.); refuse to give a
+# verdict on no evidence.
+[ -n "$log" ] || skip "dmesg returned empty after running workload"
+
+# Lower bound: if kmemleak's own per-scan tally counted leaks but the
+# verbose path emitted no "unreferenced object" line, the verbose printer
+# itself is regressed - fail rather than silently passing on no input.
+new_leaks=$(echo "$log" |
+ sed -n 's/.*kmemleak: \([0-9]\+\) new suspected.*/\1/p' |
+ awk '{s+=$1} END{print s+0}')
+printed=$(echo "$log" | grep -c 'kmemleak: unreferenced object')
+if [ "$new_leaks" -gt 0 ] && [ "$printed" -eq 0 ]; then
+ fail "verbose path broken: $new_leaks leaks counted, 0 printed in $SCAN_COUNT scans"
+fi
+
+# Walk the log: split into per-scan chunks at "N new suspected memory
+# leaks" boundaries; within each chunk, capture each "unreferenced
+# object" report's backtrace and check that no backtrace is reported
+# more than once. A duplicate within a single scan means dedup failed
+# to collapse two leaks that share an allocation site.
+violations=$(echo "$log" | awk '
+ function flush_block() {
+ if (in_block) {
+ # Skip empty backtraces: leaks with trace_handle == 0
+ # (early-boot allocations or stack_depot_save() failures
+ # under memory pressure) are intentionally not deduped,
+ # so multiple such reports in one scan are expected and
+ # must not be flagged as a regression.
+ if (bt != "")
+ seen[bt]++
+ in_block = 0
+ collecting = 0
+ bt = ""
+ }
+ }
+ function check_and_reset( b) {
+ for (b in seen)
+ if (seen[b] > 1)
+ printf("backtrace seen %d times in one scan:\n%s\n",
+ seen[b], b)
+ delete seen
+ }
+ # Scan boundary: the per-scan summary line.
+ /kmemleak: [0-9]+ new suspected memory leaks/ {
+ flush_block()
+ check_and_reset()
+ next
+ }
+ # Start of a new "unreferenced object" report.
+ /kmemleak: unreferenced object/ {
+ flush_block()
+ in_block = 1
+ next
+ }
+ # Inside a report, the "backtrace (crc ...):" line switches us to
+ # backtrace-collecting mode.
+ in_block && /kmemleak:[[:space:]]+backtrace \(crc/ {
+ collecting = 1
+ next
+ }
+ # Once collecting, capture only deeply-indented "kmemleak: " lines
+ # (stack frames have 4+ spaces of indentation under "kmemleak: ";
+ # headers and the "... and N more" tail line have less). This stops
+ # unrelated kmemleak warns landing between reports from being lumped
+ # into the backtrace key, which would mask a genuine duplicate.
+ in_block && collecting && /kmemleak:[[:space:]]{4,}/ {
+ bt = bt $0 "\n"
+ next
+ }
+ END {
+ flush_block()
+ check_and_reset()
+ }
+')
+
+if [ -n "$violations" ]; then
+ echo "$violations"
+ fail "kmemleak dedup regression: same backtrace reported more than once in a single scan"
+fi
+
+# Count the dedup summary lines so the report distinguishes "dedup
+# actually fired" from "no same-backtrace leaks turned up to dedup".
+dedup_lines=$(echo "$log" | grep -c 'more object(s) with the same backtrace')
+
+if [ "$dedup_lines" -gt 0 ]; then
+ pass "no dedup violations across $SCAN_COUNT scans; dedup fired ($dedup_lines summary line(s) observed)"
+else
+ pass "no dedup violations across $SCAN_COUNT scans; dedup had nothing to collapse"
+fi
--
2.52.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v3 0/2] mm/kmemleak: dedupe verbose scan output
2026-05-06 12:58 [PATCH v3 0/2] mm/kmemleak: dedupe verbose scan output Breno Leitao
2026-05-06 12:58 ` [PATCH v3 1/2] mm/kmemleak: dedupe verbose scan output by allocation backtrace Breno Leitao
2026-05-06 12:58 ` [PATCH v3 2/2] selftests/mm: add kmemleak verbose dedup test Breno Leitao
@ 2026-05-08 22:17 ` Andrew Morton
2 siblings, 0 replies; 4+ messages in thread
From: Andrew Morton @ 2026-05-08 22:17 UTC (permalink / raw)
To: Breno Leitao
Cc: David Hildenbrand, Lorenzo Stoakes, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
Catalin Marinas, Liam R. Howlett, linux-kernel, linux-mm,
linux-kselftest, kernel-team
On Wed, 06 May 2026 05:58:23 -0700 Breno Leitao <leitao@debian.org> wrote:
> In this version, I am not touching the kernel code, but, just fixing
> the selftest, as sashiko reported some issues, and I am addressing them
> in here.
>
> https://sashiko.dev/#/patchset/20260424-kmemleak_dedup-v2-0-8bea649b2a92%40debian.org
>
> NOTE: Additional testing has revealed that lockdep detects a potential lock
> inversion issue on kmemleak and legacy console.
>
> The problem occurs because the function holds &object->lock (a raw spinlock)
> while invoking printk to report the leak. This printk requires the legacy
> console_owner lock, which remains in active use despite the ongoing transition
> to the nbcon framework.
>
> Concurrently, console drivers such as hvc (hypervisor virtual console) acquire
> the console_owner lock and may subsequently free memory. This kfree() operation
> calls into kmemleak's __delete_object(), which in turn acquires object->lock.
>
> Although these code paths hold locks from different object instances, lockdep
> operates on lock classes rather than individual instances. Since all
> kmemleak_object->lock instances belong to the same lock class, lockdep
> identifies this as a circular dependency, even though the actual deadlock
> scenario cannot occur in practice (?!).
>
> This problem was not introduced by this patchset/selftest, but, it is exposing
> it. I plan to solve it once this patchset is done.
None of the above is usable for a [0/N] - it all pertains to the
ongoing development process and should formally be below the ^---$
separator. Because it isn't relevant to the permanent kernel record.
The v2 series had a nice cover letter, so I stole that. Please check
that the below remains the truth, the whole truth, etc.
From: Breno Leitao <leitao@debian.org>
Subject: mm/kmemleak: dedupe verbose scan output by allocation backtrace
Date: Wed, 06 May 2026 05:58:24 -0700
Patch series "mm/kmemleak: dedupe verbose scan output", v3.
I am starting to run with kmemleak in verbose enabled in some "probe
points" across the my employers fleet so that suspected leaks land in
dmesg without needing a separate read of /sys/kernel/debug/kmemleak.
The downside is that workloads which leak many objects from a single
allocation site flood the console with byte-for-byte identical backtraces.
Hundreds of duplicates per scan are common, drowning out distinct leaks
and unrelated kernel messages, while adding no signal beyond the first
occurrence.
This series collapses those duplicates inside kmemleak itself. Each
unique stackdepot trace_handle prints once per scan, followed by a short
summary line when more than one object shares it:
kmemleak: unreferenced object 0xff110001083beb00 (size 192):
kmemleak: comm "modprobe", pid 974, jiffies 4294754196
kmemleak: ...
kmemleak: backtrace (crc 6f361828):
kmemleak: __kmalloc_cache_noprof+0x1af/0x650
kmemleak: ...
kmemleak: ... and 71 more object(s) with the same backtrace
The "N new suspected memory leaks" tally and the contents of
/sys/kernel/debug/kmemleak are unchanged - the per-object detail is still
available on demand, only the verbose (dmesg) output is collapsed.
Patch 1 is the kmemleak change.
Patch 2 adds a selftest that loads samples/kmemleak's CONFIG_SAMPLE
kmemleak-test module to generate ten leaks sharing one call site and
checks that the printed count is strictly less than the reported leak
total. Not sure if Patch 2 is useful or not, if not, it is easier to
discard.
This patch (of 2):
In kmemleak's verbose mode, every unreferenced object found during a scan
is logged with its full header, hex dump and 16-frame backtrace.
Workloads that leak many objects from a single allocation site flood dmesg
with byte-for-byte identical backtraces, drowning out distinct leaks and
other kernel messages.
Dedupe within each scan using stackdepot's trace_handle as the key: for
every leaked object with a recorded stack trace, look up the
representative kmemleak_object in a per-scan xarray keyed by trace_handle.
The first sighting stores the object pointer (with a get_object()
reference) and sets object->dup_count to 1; later sightings just bump
dup_count on the representative. After the scan, walk the xarray once and
emit each unique backtrace, followed by a single summary line when more
than one object shares it.
Leaks whose trace_handle is 0 (early-boot allocations tracked before
kmemleak_init() set up object_cache, or stack_depot_save() failures under
memory pressure) cannot be deduped, so they are still printed inline via
the same locked OBJECT_ALLOCATED-checked helper. The contents of
/sys/kernel/debug/kmemleak are unchanged - only the verbose console output
is collapsed.
Safety notes:
- The xarray store happens outside object->lock: object->lock is a
raw spinlock, while xa_store() may grab xa_node slab locks at a
higher wait-context level which lockdep flags as invalid.
trace_handle is captured under object->lock (which serialises with
kmemleak_update_trace()'s writer), so it is safe to use after
dropping the lock.
- get_object() pins the kmemleak_object metadata across
rcu_read_unlock(), but the underlying tracked allocation can still
be freed concurrently. The deferred print path therefore re-acquires
object->lock and re-checks OBJECT_ALLOCATED via print_leak_locked()
before touching object->pointer; __delete_object() clears that flag
under the same lock before the user memory goes away. The same
helper is used by the trace_handle == 0 and xa_store() failure
fallbacks, so every printer in the new path has identical safety
guarantees.
- If get_object() fails after we set OBJECT_REPORTED, the object is
already being torn down (use_count hit zero); the leak count is
still accurate but the verbose line is dropped, which is correct
- the memory was freed concurrently and is no longer a leak.
- If xa_store() fails to allocate an xa_node under memory pressure,
we fall back to printing inline via print_leak_locked() instead of
silently dropping the leak.
- The hex dump is skipped for coalesced entries (dup_count > 1):
bytes would differ across objects sharing a backtrace anyway, and
skipping it removes the only remaining read of object->pointer's
contents in the deferred path. The representative's reported size
may also differ from the coalesced objects' sizes; the printed
trace_handle reflects the representative's current value rather
than the value used as the dedup key, which is normally - but not
strictly - identical.
Link: https://lore.kernel.org/20260506-kmemleak_dedup-v3-0-2d36aafc34da@debian.org
Link: https://lore.kernel.org/20260506-kmemleak_dedup-v3-1-2d36aafc34da@debian.org
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/kmemleak.c | 148 +++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 140 insertions(+), 8 deletions(-)
--- a/mm/kmemleak.c~mm-kmemleak-dedupe-verbose-scan-output-by-allocation-backtrace
+++ a/mm/kmemleak.c
@@ -92,6 +92,7 @@
#include <linux/nodemask.h>
#include <linux/mm.h>
#include <linux/workqueue.h>
+#include <linux/xarray.h>
#include <linux/crc32.h>
#include <asm/sections.h>
@@ -157,6 +158,8 @@ struct kmemleak_object {
struct hlist_head area_list;
unsigned long jiffies; /* creation timestamp */
pid_t pid; /* pid of the current task */
+ /* per-scan dedup count, valid only while in scan-local dedup xarray */
+ unsigned int dup_count;
char comm[TASK_COMM_LEN]; /* executable name */
};
@@ -360,8 +363,9 @@ static const char *__object_type_str(str
* Printing of the unreferenced objects information to the seq file. The
* print_unreferenced function must be called with the object->lock held.
*/
-static void print_unreferenced(struct seq_file *seq,
- struct kmemleak_object *object)
+static void __print_unreferenced(struct seq_file *seq,
+ struct kmemleak_object *object,
+ bool hex_dump)
{
int i;
unsigned long *entries;
@@ -373,7 +377,8 @@ static void print_unreferenced(struct se
object->pointer, object->size);
warn_or_seq_printf(seq, " comm \"%s\", pid %d, jiffies %lu\n",
object->comm, object->pid, object->jiffies);
- hex_dump_object(seq, object);
+ if (hex_dump)
+ hex_dump_object(seq, object);
warn_or_seq_printf(seq, " backtrace (crc %x):\n", object->checksum);
for (i = 0; i < nr_entries; i++) {
@@ -382,6 +387,12 @@ static void print_unreferenced(struct se
}
}
+static void print_unreferenced(struct seq_file *seq,
+ struct kmemleak_object *object)
+{
+ __print_unreferenced(seq, object, true);
+}
+
/*
* Print the kmemleak_object information. This function is used mainly for
* debugging special cases when kmemleak operations. It must be called with
@@ -1685,6 +1696,103 @@ unlock_put:
}
/*
+ * Print one leak inline. The hex dump is gated on OBJECT_ALLOCATED so it
+ * does not touch user memory that was freed concurrently; the rest of the
+ * report (backtrace, comm, pid) is always emitted since the kmemleak_object
+ * metadata is pinned by the caller.
+ */
+static void print_leak_locked(struct kmemleak_object *object, bool hex_dump)
+{
+ raw_spin_lock_irq(&object->lock);
+ __print_unreferenced(NULL, object,
+ hex_dump && (object->flags & OBJECT_ALLOCATED));
+ raw_spin_unlock_irq(&object->lock);
+}
+
+/*
+ * Per-scan dedup table for verbose leak printing. The xarray is keyed by
+ * stackdepot trace_handle and stores a pointer to the representative
+ * kmemleak_object. The per-scan repeat count lives in object->dup_count.
+ *
+ * dedup_record() must run outside object->lock: xa_store() may take
+ * mutexes (xa_node slab allocation) which lockdep would flag against the
+ * raw spinlock object->lock.
+ */
+static void dedup_record(struct xarray *dedup, struct kmemleak_object *object,
+ depot_stack_handle_t trace_handle)
+{
+ struct kmemleak_object *rep;
+ void *old;
+
+ /*
+ * No stack trace to dedup against: early-boot allocation tracked
+ * before kmemleak_init() set up object_cache, or stack_depot_save()
+ * failure under memory pressure.
+ */
+ if (!trace_handle) {
+ print_leak_locked(object, true);
+ return;
+ }
+
+ /* stack is available, now we can de-dup */
+ rep = xa_load(dedup, trace_handle);
+ if (rep) {
+ rep->dup_count++;
+ return;
+ }
+
+ /*
+ * Object is being torn down (use_count already hit zero); the
+ * tracked memory at object->pointer is unsafe to read, so skip.
+ */
+ if (!get_object(object))
+ return;
+
+ object->dup_count = 1;
+ old = xa_store(dedup, trace_handle, object, GFP_ATOMIC);
+ if (xa_is_err(old)) {
+ /* xa_node allocation failed; fall back to inline print. */
+ print_leak_locked(object, true);
+ put_object(object);
+ return;
+ }
+ /*
+ * scan_mutex serialises all writers to the dedup xarray, so xa_store()
+ * after a NULL xa_load() must always overwrite an empty slot.
+ */
+ WARN_ON_ONCE(old);
+}
+
+/*
+ * Drain the dedup table. Re-acquires object->lock and re-checks
+ * OBJECT_ALLOCATED before printing: while get_object() pins the
+ * kmemleak_object metadata, the underlying tracked allocation may have
+ * been freed since the scan walked it (kmemleak_free clears
+ * OBJECT_ALLOCATED under object->lock before the user memory goes away).
+ * The hex dump is skipped for coalesced entries since the bytes would
+ * differ across objects anyway.
+ */
+static void dedup_flush(struct xarray *dedup)
+{
+ struct kmemleak_object *object;
+ unsigned long idx;
+ unsigned int dup;
+ bool coalesced;
+
+ xa_for_each(dedup, idx, object) {
+ dup = object->dup_count;
+ coalesced = dup > 1;
+
+ print_leak_locked(object, !coalesced);
+ if (coalesced)
+ pr_warn(" ... and %u more object(s) with the same backtrace\n",
+ dup - 1);
+ put_object(object);
+ xa_erase(dedup, idx);
+ }
+}
+
+/*
* Scan data sections and all the referenced memory blocks allocated via the
* kernel's standard allocators. This function must be called with the
* scan_mutex held.
@@ -1694,6 +1802,7 @@ static void kmemleak_scan(void)
struct kmemleak_object *object;
struct zone *zone;
int __maybe_unused i;
+ struct xarray dedup;
int new_leaks = 0;
jiffies_last_scan = jiffies;
@@ -1834,10 +1943,18 @@ static void kmemleak_scan(void)
return;
/*
- * Scanning result reporting.
+ * Scanning result reporting. When verbose printing is enabled, dedupe
+ * by stackdepot trace_handle so each unique backtrace is logged once
+ * per scan, annotated with the number of objects that share it. The
+ * per-leak count below still reflects every object, and
+ * /sys/kernel/debug/kmemleak still lists them individually.
*/
+ xa_init(&dedup);
rcu_read_lock();
list_for_each_entry_rcu(object, &object_list, object_list) {
+ depot_stack_handle_t trace_handle;
+ bool dedup_print;
+
if (need_resched())
kmemleak_cond_resched(object);
@@ -1849,18 +1966,33 @@ static void kmemleak_scan(void)
if (!color_white(object))
continue;
raw_spin_lock_irq(&object->lock);
+ trace_handle = 0;
+ dedup_print = false;
if (unreferenced_object(object) &&
!(object->flags & OBJECT_REPORTED)) {
object->flags |= OBJECT_REPORTED;
-
- if (kmemleak_verbose)
- print_unreferenced(NULL, object);
-
+ if (kmemleak_verbose) {
+ trace_handle = object->trace_handle;
+ dedup_print = true;
+ }
new_leaks++;
}
raw_spin_unlock_irq(&object->lock);
+
+ /*
+ * Defer the verbose print outside object->lock: xa_store()
+ * may take xa_node slab locks at a higher wait-context level
+ * which lockdep would flag against the raw_spinlock_t
+ * object->lock. rcu_read_lock() keeps the kmemleak_object
+ * alive across the call.
+ */
+ if (dedup_print)
+ dedup_record(&dedup, object, trace_handle);
}
rcu_read_unlock();
+ /* Flush'em all */
+ dedup_flush(&dedup);
+ xa_destroy(&dedup);
if (new_leaks) {
kmemleak_found_leaks = true;
_
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-05-08 22:17 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-06 12:58 [PATCH v3 0/2] mm/kmemleak: dedupe verbose scan output Breno Leitao
2026-05-06 12:58 ` [PATCH v3 1/2] mm/kmemleak: dedupe verbose scan output by allocation backtrace Breno Leitao
2026-05-06 12:58 ` [PATCH v3 2/2] selftests/mm: add kmemleak verbose dedup test Breno Leitao
2026-05-08 22:17 ` [PATCH v3 0/2] mm/kmemleak: dedupe verbose scan output Andrew Morton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox