* [PATCH 0/2] mm/kmemleak: dedupe verbose scan output
@ 2026-04-21 13:45 Breno Leitao
2026-04-21 13:45 ` [PATCH 1/2] mm/kmemleak: dedupe verbose scan output by allocation backtrace Breno Leitao
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Breno Leitao @ 2026-04-21 13:45 UTC (permalink / raw)
To: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Shuah Khan, Catalin Marinas
Cc: linux-kernel, linux-mm, linux-kselftest, kernel-team,
Breno Leitao
I am starting to run with kmemleak in verbose enabled in some "probe
points" across the my employers fleet so that suspected leaks land in
dmesg without needing a separate read of /sys/kernel/debug/kmemleak.
The downside is that workloads which leak many objects from a single
allocation site flood the console with byte-for-byte identical
backtraces. Hundreds of duplicates per scan are common, drowning out
distinct leaks and unrelated kernel messages, while adding no signal
beyond the first occurrence.
This series collapses those duplicates inside kmemleak itself. Each
unique stackdepot trace_handle prints once per scan, followed by a
short summary line when more than one object shares it:
kmemleak: unreferenced object 0xff110001083beb00 (size 192):
kmemleak: comm "modprobe", pid 974, jiffies 4294754196
kmemleak: ...
kmemleak: backtrace (crc 6f361828):
kmemleak: __kmalloc_cache_noprof+0x1af/0x650
kmemleak: ...
kmemleak: ... and 71 more object(s) with the same backtrace
The "N new suspected memory leaks" tally and the contents of
/sys/kernel/debug/kmemleak are unchanged - the per-object detail is
still available on demand, only the verbose (dmesg) output is collapsed.
Patch 1 is the kmemleak change.
Patch 2 adds a selftest that loads samples/kmemleak's CONFIG_SAMPLE
kmemleak-test module to generate ten leaks sharing one call site and
checks that the printed count is strictly less than the reported leak
total. Not sure if Patch 2 is useful or not, if not, it is easier to
discard.
Breno Leitao (2):
mm/kmemleak: dedupe verbose scan output by allocation backtrace
selftests/mm: add kmemleak verbose dedup test
mm/kmemleak.c | 102 +++++++++++++++++-
.../selftests/mm/test_kmemleak_dedup.sh | 78 ++++++++++++++
2 files changed, 175 insertions(+), 5 deletions(-)
create mode 100755 tools/testing/selftests/mm/test_kmemleak_dedup.sh
--
2.52.0
Signed-off-by: Breno Leitao <leitao@debian.org>
---
Breno Leitao (2):
mm/kmemleak: dedupe verbose scan output by allocation backtrace
selftests/mm: add kmemleak verbose dedup test
mm/kmemleak.c | 113 +++++++++++++++++++++-
tools/testing/selftests/mm/test_kmemleak_dedup.sh | 86 ++++++++++++++++
2 files changed, 197 insertions(+), 2 deletions(-)
---
base-commit: 97e797263a5e963da3d1e66e743fd518567dfe37
change-id: 20260420-kmemleak_dedup-bee54ffa65e7
Best regards,
--
Breno Leitao <leitao@debian.org>
^ permalink raw reply [flat|nested] 9+ messages in thread* [PATCH 1/2] mm/kmemleak: dedupe verbose scan output by allocation backtrace 2026-04-21 13:45 [PATCH 0/2] mm/kmemleak: dedupe verbose scan output Breno Leitao @ 2026-04-21 13:45 ` Breno Leitao 2026-04-23 14:29 ` Catalin Marinas 2026-04-21 13:45 ` [PATCH 2/2] selftests/mm: add kmemleak verbose dedup test Breno Leitao 2026-04-24 13:53 ` [PATCH 0/2] mm/kmemleak: dedupe verbose scan output Andrew Morton 2 siblings, 1 reply; 9+ messages in thread From: Breno Leitao @ 2026-04-21 13:45 UTC (permalink / raw) To: Andrew Morton, David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan, Catalin Marinas Cc: linux-kernel, linux-mm, linux-kselftest, kernel-team, Breno Leitao In kmemleak's verbose mode, every unreferenced object found during a scan is logged with its full header, hex dump and 16-frame backtrace. Workloads that leak many objects from a single allocation site flood dmesg with byte-for-byte identical backtraces, drowning out distinct leaks and other kernel messages. Dedupe within each scan using stackdepot's trace_handle as the key: for every leaked object, look up an entry in a per-scan xarray keyed by trace_handle. The first sighting stores a representative object; later sightings just bump a counter. After the scan, walk the xarray once and emit each unique backtrace, followed by a single summary line when more than one object shares it. Important to say that the contents of /sys/kernel/debug/kmemleak are unchanged - only the verbose console output is collapsed. Note 1: The xarray operations and kmalloc(GFP_ATOMIC) for the dedup entry must happen outside object->lock: object->lock is a raw spinlock, while the slab path takes higher wait-context locks (n->list_lock), which lockdep flags as an invalid wait context. trace_handle is read under object->lock, which serialises with kmemleak_update_trace()'s writer, so it is safe to capture and use after dropping the lock. Note 2: Stashed object pointers carry a get_object() reference across rcu_read_unlock() that dedup_flush() drops after printing, preventing use-after-free if the underlying allocation is freed concurrently. Signed-off-by: Breno Leitao <leitao@debian.org> --- mm/kmemleak.c | 113 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 111 insertions(+), 2 deletions(-) diff --git a/mm/kmemleak.c b/mm/kmemleak.c index 2eff0d6b622b6..046847d372777 100644 --- a/mm/kmemleak.c +++ b/mm/kmemleak.c @@ -92,6 +92,7 @@ #include <linux/nodemask.h> #include <linux/mm.h> #include <linux/workqueue.h> +#include <linux/xarray.h> #include <linux/crc32.h> #include <asm/sections.h> @@ -1684,6 +1685,82 @@ static void kmemleak_cond_resched(struct kmemleak_object *object) put_object(object); } +/* + * Per-scan dedup table for verbose leak printing. Each entry collapses all + * leaks that share one allocation backtrace (keyed by stackdepot + * trace_handle) into a single representative object plus a count. + */ +struct kmemleak_dedup_entry { + struct kmemleak_object *object; + unsigned long count; +}; + +/* + * Record a leaked object in the dedup table. The representative object's + * use_count is incremented so it can be safely dereferenced by dedup_flush() + * outside the RCU read section; dedup_flush() drops the reference. On + * allocation failure (or a concurrent insert) the object is printed + * immediately, preserving today's "always log every leak" guarantee. + * Caller must not hold object->lock and must hold rcu_read_lock(). + */ +static void dedup_record(struct xarray *dedup, struct kmemleak_object *object, + depot_stack_handle_t trace_handle) +{ + struct kmemleak_dedup_entry *entry; + + entry = xa_load(dedup, trace_handle); + if (entry) { + /* This is a known beast, just increase the counter */ + entry->count++; + return; + } + + /* + * A brand new report. Object will have object->use_count increased + * in here, and released put_object() at dedup_flush + */ + entry = kmalloc(sizeof(*entry), GFP_ATOMIC); + if (entry && get_object(object)) { + if (xa_insert(dedup, trace_handle, entry, GFP_ATOMIC) == 0) { + entry->object = object; + entry->count = 1; + return; + } + put_object(object); + } + kfree(entry); + + /* + * Fallback for kmalloc/get_object(): Just print it straight away + */ + raw_spin_lock_irq(&object->lock); + print_unreferenced(NULL, object); + raw_spin_unlock_irq(&object->lock); +} + +/* + * Drain the dedup table: print one full record per unique backtrace, + * followed by a summary line whenever more than one object shared it. + * Releases the reference dedup_record() took on each representative object. + */ +static void dedup_flush(struct xarray *dedup) +{ + struct kmemleak_dedup_entry *entry; + unsigned long idx; + + xa_for_each(dedup, idx, entry) { + raw_spin_lock_irq(&entry->object->lock); + print_unreferenced(NULL, entry->object); + raw_spin_unlock_irq(&entry->object->lock); + if (entry->count > 1) + pr_warn(" ... and %lu more object(s) with the same backtrace\n", + entry->count - 1); + put_object(entry->object); + kfree(entry); + xa_erase(dedup, idx); + } +} + /* * Scan data sections and all the referenced memory blocks allocated via the * kernel's standard allocators. This function must be called with the @@ -1834,10 +1911,19 @@ static void kmemleak_scan(void) return; /* - * Scanning result reporting. + * Scanning result reporting. When verbose printing is enabled, dedupe + * by stackdepot trace_handle so each unique backtrace is logged once + * per scan, annotated with the number of objects that share it. The + * per-leak count below still reflects every object, and + * /sys/kernel/debug/kmemleak still lists them individually. */ + struct xarray dedup; + + xa_init(&dedup); rcu_read_lock(); list_for_each_entry_rcu(object, &object_list, object_list) { + depot_stack_handle_t trace_handle; + if (need_resched()) kmemleak_cond_resched(object); @@ -1849,18 +1935,41 @@ static void kmemleak_scan(void) if (!color_white(object)) continue; raw_spin_lock_irq(&object->lock); + trace_handle = 0; if (unreferenced_object(object) && !(object->flags & OBJECT_REPORTED)) { object->flags |= OBJECT_REPORTED; if (kmemleak_verbose) - print_unreferenced(NULL, object); + trace_handle = object->trace_handle; new_leaks++; } raw_spin_unlock_irq(&object->lock); + + /* + * Dedup bookkeeping must happen outside object->lock. + * dedup_record() may call kmalloc(GFP_ATOMIC), and the slab + * path takes locks (n->list_lock, etc.) at a higher + * wait-context level than the raw_spinlock_t object->lock; + * + * Passing object without object->lock here is safe: + * - the surrounding rcu_read_lock() keeps the memory alive + * even if a concurrent kmemleak_free() drops use_count to + * zero and queues free_object_rcu(); + * - dedup_record() only manipulates use_count via the atomic + * get_object()/put_object() helpers and stores the bare + * pointer into the xarray; + * - on the fallback print path it re-acquires object->lock + * before calling print_unreferenced(). + */ + if (trace_handle) + dedup_record(&dedup, object, trace_handle); } rcu_read_unlock(); + /* Flush'em all */ + dedup_flush(&dedup); + xa_destroy(&dedup); if (new_leaks) { kmemleak_found_leaks = true; -- 2.52.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] mm/kmemleak: dedupe verbose scan output by allocation backtrace 2026-04-21 13:45 ` [PATCH 1/2] mm/kmemleak: dedupe verbose scan output by allocation backtrace Breno Leitao @ 2026-04-23 14:29 ` Catalin Marinas 2026-04-24 9:26 ` Breno Leitao 0 siblings, 1 reply; 9+ messages in thread From: Catalin Marinas @ 2026-04-23 14:29 UTC (permalink / raw) To: Breno Leitao Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-kernel, linux-mm, linux-kselftest, kernel-team On Tue, Apr 21, 2026 at 06:45:04AM -0700, Breno Leitao wrote: > +/* > + * Record a leaked object in the dedup table. The representative object's > + * use_count is incremented so it can be safely dereferenced by dedup_flush() > + * outside the RCU read section; dedup_flush() drops the reference. On > + * allocation failure (or a concurrent insert) the object is printed > + * immediately, preserving today's "always log every leak" guarantee. > + * Caller must not hold object->lock and must hold rcu_read_lock(). > + */ > +static void dedup_record(struct xarray *dedup, struct kmemleak_object *object, > + depot_stack_handle_t trace_handle) > +{ > + struct kmemleak_dedup_entry *entry; > + > + entry = xa_load(dedup, trace_handle); > + if (entry) { > + /* This is a known beast, just increase the counter */ > + entry->count++; > + return; > + } > + > + /* > + * A brand new report. Object will have object->use_count increased > + * in here, and released put_object() at dedup_flush > + */ > + entry = kmalloc(sizeof(*entry), GFP_ATOMIC); Do we need to allocate a structure here? We could instead add a dup_count member in the kmemleak_object and just link the object itself into the xarray. Well, maybe the leak being a rare event is not that bad. > + if (entry && get_object(object)) { > + if (xa_insert(dedup, trace_handle, entry, GFP_ATOMIC) == 0) { I wonder if we need xa_insert() at all. Since it's indexed by trace_handle, we could follow similar mechanism like stack_depot with a large hash array, maybe gated by CONFIG_DEBUG_KMEMLEAK_VERBOSE. > + entry->object = object; > + entry->count = 1; > + return; > + } > + put_object(object); > + } > + kfree(entry); > + > + /* > + * Fallback for kmalloc/get_object(): Just print it straight away > + */ > + raw_spin_lock_irq(&object->lock); > + print_unreferenced(NULL, object); > + raw_spin_unlock_irq(&object->lock); > +} > + > +/* > + * Drain the dedup table: print one full record per unique backtrace, > + * followed by a summary line whenever more than one object shared it. > + * Releases the reference dedup_record() took on each representative object. > + */ > +static void dedup_flush(struct xarray *dedup) > +{ > + struct kmemleak_dedup_entry *entry; > + unsigned long idx; > + > + xa_for_each(dedup, idx, entry) { > + raw_spin_lock_irq(&entry->object->lock); > + print_unreferenced(NULL, entry->object); > + raw_spin_unlock_irq(&entry->object->lock); Sashiko has a good point here - while the kmemleak metadata is still around due to an earlier get_object(), the object itself may have been freed and the hex dump in print_unreferenced() could fault (e.g. vunmap'ed object). Same with the print_unreferenced() above. It's probably not worth printing the first bytes of the content anyway when we do coalescing, the content would differ anyway. Also it's possible that the size differs even if the stack trace is the same but I guess we can ignore this. https://sashiko.dev/#/patchset/20260421-kmemleak_dedup-v1-0-65e31c6cdf0c@debian.org -- Catalin ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] mm/kmemleak: dedupe verbose scan output by allocation backtrace 2026-04-23 14:29 ` Catalin Marinas @ 2026-04-24 9:26 ` Breno Leitao 2026-04-24 12:05 ` Catalin Marinas 0 siblings, 1 reply; 9+ messages in thread From: Breno Leitao @ 2026-04-24 9:26 UTC (permalink / raw) To: Catalin Marinas Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-kernel, linux-mm, linux-kselftest, kernel-team On Thu, Apr 23, 2026 at 03:29:08PM +0100, Catalin Marinas wrote: > On Tue, Apr 21, 2026 at 06:45:04AM -0700, Breno Leitao wrote: > > + /* > > + * A brand new report. Object will have object->use_count increased > > + * in here, and released put_object() at dedup_flush > > + */ > > + entry = kmalloc(sizeof(*entry), GFP_ATOMIC); > > Do we need to allocate a structure here? We could instead add a > dup_count member in the kmemleak_object and just link the object itself > into the xarray. Well, maybe the leak being a rare event is not that > bad. Ack, that is a way better approach than this new struct and memory allocation. > > > + if (entry && get_object(object)) { > > + if (xa_insert(dedup, trace_handle, entry, GFP_ATOMIC) == 0) { > > I wonder if we need xa_insert() at all. Since it's indexed by > trace_handle, we could follow similar mechanism like stack_depot with a > large hash array, maybe gated by CONFIG_DEBUG_KMEMLEAK_VERBOSE. This custom method would buy us a few things: no GFP_ATOMIC on the scan path, no xa_node churn, no wait-context concern around object->lock, and the xa_is_err fallback goes away. The cost is a static bucket array (a few tens of KB) always reserved. Keep in mind that this code also change when uses set `kmemleak_verbose`, so, we cannot gate it using CONFIG_DEBUG_KMEMLEAK_VERBOSE. The xarray approach keeps the per-scan storage proportional to the number of distinct backtraces actually seen, and scan_mutex already serialises everything, so the GFP_ATOMIC/xa_node overhead only materialises when leaks are being reported. Given how cold the scan path is, I'd lean toward keeping xarray unless you'd prefer the static-table version for maintainability reasons — happy to respin either way. > > + entry->object = object; > > + entry->count = 1; > > + return; > > + } > > + put_object(object); > > + } > > + kfree(entry); > > + > > + /* > > + * Fallback for kmalloc/get_object(): Just print it straight away > > + */ > > + raw_spin_lock_irq(&object->lock); > > + print_unreferenced(NULL, object); > > + raw_spin_unlock_irq(&object->lock); > > +} > > + > > +/* > > + * Drain the dedup table: print one full record per unique backtrace, > > + * followed by a summary line whenever more than one object shared it. > > + * Releases the reference dedup_record() took on each representative object. > > + */ > > +static void dedup_flush(struct xarray *dedup) > > +{ > > + struct kmemleak_dedup_entry *entry; > > + unsigned long idx; > > + > > + xa_for_each(dedup, idx, entry) { > > + raw_spin_lock_irq(&entry->object->lock); > > + print_unreferenced(NULL, entry->object); > > + raw_spin_unlock_irq(&entry->object->lock); > > Sashiko has a good point here - while the kmemleak metadata is still > around due to an earlier get_object(), the object itself may have been > freed and the hex dump in print_unreferenced() could fault (e.g. > vunmap'ed object). Same with the print_unreferenced() above. It's > probably not worth printing the first bytes of the content anyway when > we do coalescing, the content would differ anyway. Agreed, this is a real race once the print is deferred. v2 addresses it two ways: 1. dedup_flush() re-acquires object->lock and re-checks OBJECT_ALLOCATED before printing. __delete_object() clears that flag under the same lock before the user memory is released, so the bytes are guaranteed live for the duration of the print. 2. The hex dump is skipped for coalesced entries (dup_count > 1), as you suggested — the content would differ anyway, and skipping it removes the only remaining read of object->pointer's contents in the deferred path. > Also it's possible > that the size differs even if the stack trace is the same but I guess we > can ignore this. Yes, ignoring — the representative's size is what gets printed and the summary line just gives the count. Worth a sentence in the commit message; I'll add one. Thanks for the review and suggestion! In casse you are curious, this is the new approach I am testing now: commit fb356de0592c0de97d793207ef74b0f3f019379a Author: Breno Leitao <leitao@debian.org> Date: Mon Apr 20 09:26:00 2026 -0700 mm/kmemleak: dedupe verbose scan output by allocation backtrace In kmemleak's verbose mode, every unreferenced object found during a scan is logged with its full header, hex dump and 16-frame backtrace. Workloads that leak many objects from a single allocation site flood dmesg with byte-for-byte identical backtraces, drowning out distinct leaks and other kernel messages. Dedupe within each scan using stackdepot's trace_handle as the key: for every leaked object with a recorded stack trace, look up the representative kmemleak_object in a per-scan xarray keyed by trace_handle. The first sighting stores the object pointer (with a get_object() reference) and sets object->dup_count to 1; later sightings just bump dup_count on the representative. After the scan, walk the xarray once and emit each unique backtrace, followed by a single summary line when more than one object shares it. Leaks whose trace_handle is 0 (early-boot allocations tracked before kmemleak_init() set up object_cache, or stack_depot_save() failures under memory pressure) cannot be deduped, so they are still printed inline via the same locked OBJECT_ALLOCATED-checked helper. The contents of /sys/kernel/debug/kmemleak are unchanged - only the verbose console output is collapsed. Safety notes: - The xarray store happens outside object->lock: object->lock is a raw spinlock, while xa_store() may grab xa_node slab locks at a higher wait-context level which lockdep flags as invalid. trace_handle is captured under object->lock (which serialises with kmemleak_update_trace()'s writer), so it is safe to use after dropping the lock. - get_object() pins the kmemleak_object metadata across rcu_read_unlock(), but the underlying tracked allocation can still be freed concurrently. The deferred print path therefore re-acquires object->lock and re-checks OBJECT_ALLOCATED via print_leak_locked() before touching object->pointer; __delete_object() clears that flag under the same lock before the user memory goes away. The same helper is used by the trace_handle == 0 and xa_store() failure fallbacks, so every printer in the new path has identical safety guarantees. - If get_object() fails after we set OBJECT_REPORTED, the object is already being torn down (use_count hit zero); the leak count is still accurate but the verbose line is dropped, which is correct - the memory was freed concurrently and is no longer a leak. - If xa_store() fails to allocate an xa_node under memory pressure, we fall back to printing inline via print_leak_locked() instead of silently dropping the leak. - The hex dump is skipped for coalesced entries (dup_count > 1): bytes would differ across objects sharing a backtrace anyway, and skipping it removes the only remaining read of object->pointer's contents in the deferred path. The representative's reported size may also differ from the coalesced objects' sizes; the printed trace_handle reflects the representative's current value rather than the value used as the dedup key, which is normally - but not strictly - identical. Signed-off-by: Breno Leitao <leitao@debian.org> diff --git a/mm/kmemleak.c b/mm/kmemleak.c index 2eff0d6b622b6..d521cc71ec1ee 100644 --- a/mm/kmemleak.c +++ b/mm/kmemleak.c @@ -92,6 +92,7 @@ #include <linux/nodemask.h> #include <linux/mm.h> #include <linux/workqueue.h> +#include <linux/xarray.h> #include <linux/crc32.h> #include <asm/sections.h> @@ -153,6 +154,8 @@ struct kmemleak_object { /* checksum for detecting modified objects */ u32 checksum; depot_stack_handle_t trace_handle; + /* per-scan dedup count, valid only while in scan-local dedup xarray */ + unsigned int dup_count; /* memory ranges to be scanned inside an object (empty for all) */ struct hlist_head area_list; unsigned long jiffies; /* creation timestamp */ @@ -360,8 +363,9 @@ static const char *__object_type_str(struct kmemleak_object *object) * Printing of the unreferenced objects information to the seq file. The * print_unreferenced function must be called with the object->lock held. */ -static void print_unreferenced(struct seq_file *seq, - struct kmemleak_object *object) +static void __print_unreferenced(struct seq_file *seq, + struct kmemleak_object *object, + bool no_hex_dump) { int i; unsigned long *entries; @@ -373,7 +377,8 @@ static void print_unreferenced(struct seq_file *seq, object->pointer, object->size); warn_or_seq_printf(seq, " comm \"%s\", pid %d, jiffies %lu\n", object->comm, object->pid, object->jiffies); - hex_dump_object(seq, object); + if (!no_hex_dump) + hex_dump_object(seq, object); warn_or_seq_printf(seq, " backtrace (crc %x):\n", object->checksum); for (i = 0; i < nr_entries; i++) { @@ -382,6 +387,12 @@ static void print_unreferenced(struct seq_file *seq, } } +static void print_unreferenced(struct seq_file *seq, + struct kmemleak_object *object) +{ + __print_unreferenced(seq, object, false); +} + /* * Print the kmemleak_object information. This function is used mainly for * debugging special cases when kmemleak operations. It must be called with @@ -1684,6 +1695,103 @@ static void kmemleak_cond_resched(struct kmemleak_object *object) put_object(object); } +/* + * Print one leak inline, re-checking OBJECT_ALLOCATED under the lock so + * the hex dump does not touch user memory that was freed concurrently. + * Used by the dedup_record() fallback paths where we cannot dedup and defer + * printing through the xarray. + */ +static void print_leak_locked(struct kmemleak_object *object, bool no_hex_dump) +{ + raw_spin_lock_irq(&object->lock); + if (object->flags & OBJECT_ALLOCATED) + __print_unreferenced(NULL, object, no_hex_dump); + raw_spin_unlock_irq(&object->lock); +} + +/* + * Per-scan dedup table for verbose leak printing. The xarray is keyed by + * stackdepot trace_handle and stores a pointer to the representative + * kmemleak_object. The per-scan repeat count lives in object->dup_count. + * + * dedup_record() must run outside object->lock: xa_store() may take + * mutexes (xa_node slab allocation) which lockdep would flag against the + * raw spinlock object->lock. + */ +static void dedup_record(struct xarray *dedup, struct kmemleak_object *object, + depot_stack_handle_t trace_handle) +{ + struct kmemleak_object *rep; + void *old; + + /* + * No stack trace to dedup against: early-boot allocation tracked + * before kmemleak_init() set up object_cache, or stack_depot_save() + * failure under memory pressure. + */ + if (!trace_handle) { + print_leak_locked(object, false); + return; + } + + /* stack is available, now we can de-dup */ + rep = xa_load(dedup, trace_handle); + if (rep) { + rep->dup_count++; + return; + } + + /* + * Object is being torn down (use_count already hit zero); the + * tracked memory at object->pointer is unsafe to read, so skip. + */ + if (!get_object(object)) + return; + + object->dup_count = 1; + old = xa_store(dedup, trace_handle, object, GFP_ATOMIC); + if (xa_is_err(old)) { + /* xa_node allocation failed; fall back to inline print. */ + print_leak_locked(object, false); + put_object(object); + return; + } + /* + * scan_mutex serialises all writers to the dedup xarray, so xa_store() + * after a NULL xa_load() must always overwrite an empty slot. + */ + WARN_ON_ONCE(old); +} + +/* + * Drain the dedup table. Re-acquires object->lock and re-checks + * OBJECT_ALLOCATED before printing: while get_object() pins the + * kmemleak_object metadata, the underlying tracked allocation may have + * been freed since the scan walked it (kmemleak_free clears + * OBJECT_ALLOCATED under object->lock before the user memory goes away). + * The hex dump is skipped for coalesced entries since the bytes would + * differ across objects anyway. + */ +static void dedup_flush(struct xarray *dedup) +{ + struct kmemleak_object *object; + unsigned long idx; + unsigned int dup; + bool coalesced; + + xa_for_each(dedup, idx, object) { + dup = object->dup_count; + coalesced = dup > 1; + + print_leak_locked(object, coalesced); + if (coalesced) + pr_warn(" ... and %u more object(s) with the same backtrace\n", + dup - 1); + put_object(object); + xa_erase(dedup, idx); + } +} + /* * Scan data sections and all the referenced memory blocks allocated via the * kernel's standard allocators. This function must be called with the @@ -1694,6 +1802,7 @@ static void kmemleak_scan(void) struct kmemleak_object *object; struct zone *zone; int __maybe_unused i; + struct xarray dedup; int new_leaks = 0; jiffies_last_scan = jiffies; @@ -1834,10 +1943,18 @@ static void kmemleak_scan(void) return; /* - * Scanning result reporting. + * Scanning result reporting. When verbose printing is enabled, dedupe + * by stackdepot trace_handle so each unique backtrace is logged once + * per scan, annotated with the number of objects that share it. The + * per-leak count below still reflects every object, and + * /sys/kernel/debug/kmemleak still lists them individually. */ + xa_init(&dedup); rcu_read_lock(); list_for_each_entry_rcu(object, &object_list, object_list) { + depot_stack_handle_t trace_handle; + bool dedup_print; + if (need_resched()) kmemleak_cond_resched(object); @@ -1849,18 +1966,33 @@ static void kmemleak_scan(void) if (!color_white(object)) continue; raw_spin_lock_irq(&object->lock); + trace_handle = 0; + dedup_print = false; if (unreferenced_object(object) && !(object->flags & OBJECT_REPORTED)) { object->flags |= OBJECT_REPORTED; - - if (kmemleak_verbose) - print_unreferenced(NULL, object); - + if (kmemleak_verbose) { + trace_handle = object->trace_handle; + dedup_print = true; + } new_leaks++; } raw_spin_unlock_irq(&object->lock); + + /* + * Defer the verbose print outside object->lock: xa_store() + * may take xa_node slab locks at a higher wait-context level + * which lockdep would flag against the raw_spinlock_t + * object->lock. rcu_read_lock() keeps the kmemleak_object + * alive across the call. + */ + if (dedup_print) + dedup_record(&dedup, object, trace_handle); } rcu_read_unlock(); + /* Flush'em all */ + dedup_flush(&dedup); + xa_destroy(&dedup); if (new_leaks) { kmemleak_found_leaks = true; ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] mm/kmemleak: dedupe verbose scan output by allocation backtrace 2026-04-24 9:26 ` Breno Leitao @ 2026-04-24 12:05 ` Catalin Marinas 2026-04-24 12:43 ` Breno Leitao 0 siblings, 1 reply; 9+ messages in thread From: Catalin Marinas @ 2026-04-24 12:05 UTC (permalink / raw) To: Breno Leitao Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-kernel, linux-mm, linux-kselftest, kernel-team On Fri, Apr 24, 2026 at 02:26:53AM -0700, Breno Leitao wrote: > diff --git a/mm/kmemleak.c b/mm/kmemleak.c > index 2eff0d6b622b6..d521cc71ec1ee 100644 > --- a/mm/kmemleak.c > +++ b/mm/kmemleak.c > @@ -92,6 +92,7 @@ > #include <linux/nodemask.h> > #include <linux/mm.h> > #include <linux/workqueue.h> > +#include <linux/xarray.h> > #include <linux/crc32.h> > > #include <asm/sections.h> > @@ -153,6 +154,8 @@ struct kmemleak_object { > /* checksum for detecting modified objects */ > u32 checksum; > depot_stack_handle_t trace_handle; > + /* per-scan dedup count, valid only while in scan-local dedup xarray */ > + unsigned int dup_count; I would add this around the pid_t pid member since both are 32-bit, better struct compaction. Here we'll get 32-bit padding. > /* memory ranges to be scanned inside an object (empty for all) */ > struct hlist_head area_list; > unsigned long jiffies; /* creation timestamp */ > @@ -360,8 +363,9 @@ static const char *__object_type_str(struct kmemleak_object *object) > * Printing of the unreferenced objects information to the seq file. The > * print_unreferenced function must be called with the object->lock held. > */ > -static void print_unreferenced(struct seq_file *seq, > - struct kmemleak_object *object) > +static void __print_unreferenced(struct seq_file *seq, > + struct kmemleak_object *object, > + bool no_hex_dump) > { > int i; > unsigned long *entries; > @@ -373,7 +377,8 @@ static void print_unreferenced(struct seq_file *seq, > object->pointer, object->size); > warn_or_seq_printf(seq, " comm \"%s\", pid %d, jiffies %lu\n", > object->comm, object->pid, object->jiffies); > - hex_dump_object(seq, object); > + if (!no_hex_dump) > + hex_dump_object(seq, object); Nit: just use "hex_dump" and avoid double negation. > warn_or_seq_printf(seq, " backtrace (crc %x):\n", object->checksum); > > for (i = 0; i < nr_entries; i++) { > @@ -382,6 +387,12 @@ static void print_unreferenced(struct seq_file *seq, > } > } > > +static void print_unreferenced(struct seq_file *seq, > + struct kmemleak_object *object) > +{ > + __print_unreferenced(seq, object, false); > +} > + > /* > * Print the kmemleak_object information. This function is used mainly for > * debugging special cases when kmemleak operations. It must be called with > @@ -1684,6 +1695,103 @@ static void kmemleak_cond_resched(struct kmemleak_object *object) > put_object(object); > } > > +/* > + * Print one leak inline, re-checking OBJECT_ALLOCATED under the lock so > + * the hex dump does not touch user memory that was freed concurrently. > + * Used by the dedup_record() fallback paths where we cannot dedup and defer > + * printing through the xarray. > + */ > +static void print_leak_locked(struct kmemleak_object *object, bool no_hex_dump) > +{ > + raw_spin_lock_irq(&object->lock); > + if (object->flags & OBJECT_ALLOCATED) > + __print_unreferenced(NULL, object, no_hex_dump); > + raw_spin_unlock_irq(&object->lock); I don't think OBJECT_ALLOCATED should prevent the printing here. If it's called from dedup_flush() and the first object that kept accumulating the dup_count is freed, you'd not print anything. I would only use OBJECT_ALLOCATED to decide whether to do the hex dump if requested. -- Catalin ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] mm/kmemleak: dedupe verbose scan output by allocation backtrace 2026-04-24 12:05 ` Catalin Marinas @ 2026-04-24 12:43 ` Breno Leitao 0 siblings, 0 replies; 9+ messages in thread From: Breno Leitao @ 2026-04-24 12:43 UTC (permalink / raw) To: Catalin Marinas Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-kernel, linux-mm, linux-kselftest, kernel-team On Fri, Apr 24, 2026 at 01:05:20PM +0100, Catalin Marinas wrote: > > @@ -153,6 +154,8 @@ struct kmemleak_object { > > /* checksum for detecting modified objects */ > > u32 checksum; > > depot_stack_handle_t trace_handle; > > + /* per-scan dedup count, valid only while in scan-local dedup xarray */ > > + unsigned int dup_count; > > I would add this around the pid_t pid member since both are 32-bit, > better struct compaction. Here we'll get 32-bit padding. Ack! > > - hex_dump_object(seq, object); > > + if (!no_hex_dump) > > + hex_dump_object(seq, object); > > Nit: just use "hex_dump" and avoid double negation. Ack! > > +static void print_leak_locked(struct kmemleak_object *object, bool no_hex_dump) > > +{ > > + raw_spin_lock_irq(&object->lock); > > + if (object->flags & OBJECT_ALLOCATED) > > + __print_unreferenced(NULL, object, no_hex_dump); > > + raw_spin_unlock_irq(&object->lock); > > I don't think OBJECT_ALLOCATED should prevent the printing here. If it's > called from dedup_flush() and the first object that kept accumulating > the dup_count is freed, you'd not print anything. I would only use > OBJECT_ALLOCATED to decide whether to do the hex dump if requested. That makes sense. I suppose we want something like: __print_unreferenced(NULL, object, hex_dump && (object->flags & OBJECT_ALLOCATED)); Thanks for the review so far, I will respin the series, --breno ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 2/2] selftests/mm: add kmemleak verbose dedup test 2026-04-21 13:45 [PATCH 0/2] mm/kmemleak: dedupe verbose scan output Breno Leitao 2026-04-21 13:45 ` [PATCH 1/2] mm/kmemleak: dedupe verbose scan output by allocation backtrace Breno Leitao @ 2026-04-21 13:45 ` Breno Leitao 2026-04-24 13:53 ` [PATCH 0/2] mm/kmemleak: dedupe verbose scan output Andrew Morton 2 siblings, 0 replies; 9+ messages in thread From: Breno Leitao @ 2026-04-21 13:45 UTC (permalink / raw) To: Andrew Morton, David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan, Catalin Marinas Cc: linux-kernel, linux-mm, linux-kselftest, kernel-team, Breno Leitao Exercise the per-scan dedup of kmemleak's verbose leak output added in the previous commit. The test depends on the kmemleak-test sample module (CONFIG_SAMPLE_KMEMLEAK=m); load it and unload it to orphan ten list entries from a single kzalloc() call site that all share one stackdepot trace_handle, trigger two scans, and assert that the number of "unreferenced object" lines printed in dmesg is strictly less than the number of leaks reported. Skip cleanly when kmemleak is absent, disabled at runtime, or CONFIG_SAMPLE_KMEMLEAK is not built in. Signed-off-by: Breno Leitao <leitao@debian.org> --- tools/testing/selftests/mm/test_kmemleak_dedup.sh | 86 +++++++++++++++++++++++ 1 file changed, 86 insertions(+) diff --git a/tools/testing/selftests/mm/test_kmemleak_dedup.sh b/tools/testing/selftests/mm/test_kmemleak_dedup.sh new file mode 100755 index 0000000000000..1a1b6efd6470a --- /dev/null +++ b/tools/testing/selftests/mm/test_kmemleak_dedup.sh @@ -0,0 +1,86 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Verify that kmemleak's verbose scan output deduplicates leaks that share +# the same allocation backtrace. The kmemleak-test module leaks 10 list +# entries from a single kzalloc() call site, so they share one stackdepot +# trace_handle. With dedup, only one "unreferenced object" line should be +# printed for that backtrace per scan, while the per-scan leak counter +# still accounts for every object. +# +# The expected output is something like: +# PASS: kmemleak verbose output deduplicated (11 printed for 61 leaks) +# +# Author: Breno Leitao <leitao@debian.org> + +ksft_skip=4 +KMEMLEAK=/sys/kernel/debug/kmemleak +VERBOSE_PARAM=/sys/module/kmemleak/parameters/verbose +MODULE=kmemleak-test + +skip() { + echo "SKIP: $*" + exit $ksft_skip +} + +fail() { + echo "FAIL: $*" + exit 1 +} + +[ "$(id -u)" -eq 0 ] || skip "must run as root" +[ -r "$KMEMLEAK" ] || skip "no kmemleak debugfs (CONFIG_DEBUG_KMEMLEAK)" +[ -w "$VERBOSE_PARAM" ] || skip "kmemleak verbose param missing" +modinfo "$MODULE" >/dev/null 2>&1 || + skip "$MODULE not built (CONFIG_SAMPLE_KMEMLEAK)" + +# kmemleak can be present but disabled at runtime (boot arg kmemleak=off, +# or it self-disabled after an internal error). In that state writes other +# than "clear" return EPERM, so probe once and skip if so. +if ! echo scan > "$KMEMLEAK" 2>/dev/null; then + skip "kmemleak is disabled (check dmesg or kmemleak= boot arg)" +fi + +prev_verbose=$(cat "$VERBOSE_PARAM") +cleanup() { + echo "$prev_verbose" > "$VERBOSE_PARAM" 2>/dev/null + rmmod "$MODULE" 2>/dev/null +} +trap cleanup EXIT + +echo 1 > "$VERBOSE_PARAM" + +# Drain the existing leak set so the next scan only reports our objects. +echo clear > "$KMEMLEAK" + +modprobe "$MODULE" || fail "failed to load $MODULE" +# Removing the module orphans the list elements without freeing them. +rmmod "$MODULE" || fail "failed to unload $MODULE" + +# Two scans: kmemleak requires the object to survive a full scan cycle +# before it is reported as unreferenced. +dmesg -C >/dev/null +echo scan > "$KMEMLEAK"; sleep 6 +echo scan > "$KMEMLEAK"; sleep 6 + +log=$(dmesg) + +new_leaks=$(echo "$log" | + sed -n 's/.*kmemleak: \([0-9]\+\) new suspected.*/\1/p' | tail -1) +[ -n "$new_leaks" ] || fail "no 'new suspected memory leaks' line found" + +# Count "unreferenced object" lines emitted in verbose output. +printed=$(echo "$log" | grep -c 'kmemleak: unreferenced object') + +echo "new_leaks=$new_leaks printed=$printed" + +# The kzalloc(sizeof(*elem)) loop alone contributes 10 leaks sharing one +# backtrace, so without dedup printed >= 10. With dedup the printed count +# must be strictly less than the reported leak total. +[ "$new_leaks" -ge 10 ] || fail "expected >=10 new leaks, got $new_leaks" +[ "$printed" -lt "$new_leaks" ] || \ + fail "no dedup: printed=$printed new_leaks=$new_leaks" + +echo "PASS: kmemleak verbose output deduplicated" \ + "($printed printed for $new_leaks leaks)" +exit 0 -- 2.52.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 0/2] mm/kmemleak: dedupe verbose scan output 2026-04-21 13:45 [PATCH 0/2] mm/kmemleak: dedupe verbose scan output Breno Leitao 2026-04-21 13:45 ` [PATCH 1/2] mm/kmemleak: dedupe verbose scan output by allocation backtrace Breno Leitao 2026-04-21 13:45 ` [PATCH 2/2] selftests/mm: add kmemleak verbose dedup test Breno Leitao @ 2026-04-24 13:53 ` Andrew Morton 2026-04-24 14:36 ` Breno Leitao 2 siblings, 1 reply; 9+ messages in thread From: Andrew Morton @ 2026-04-24 13:53 UTC (permalink / raw) To: Breno Leitao Cc: David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan, Catalin Marinas, linux-kernel, linux-mm, linux-kselftest, kernel-team On Tue, 21 Apr 2026 06:45:03 -0700 Breno Leitao <leitao@debian.org> wrote: > I am starting to run with kmemleak in verbose enabled in some "probe > points" across the my employers fleet so that suspected leaks land in > dmesg without needing a separate read of /sys/kernel/debug/kmemleak. > > The downside is that workloads which leak many objects from a single > allocation site flood the console with byte-for-byte identical > backtraces. Hundreds of duplicates per scan are common, drowning out > distinct leaks and unrelated kernel messages, while adding no signal > beyond the first occurrence. > > This series collapses those duplicates inside kmemleak itself. Each > unique stackdepot trace_handle prints once per scan, followed by a > short summary line when more than one object shares it: AI review: https://sashiko.dev/#/patchset/20260421-kmemleak_dedup-v1-0-65e31c6cdf0c@debian.org ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/2] mm/kmemleak: dedupe verbose scan output 2026-04-24 13:53 ` [PATCH 0/2] mm/kmemleak: dedupe verbose scan output Andrew Morton @ 2026-04-24 14:36 ` Breno Leitao 0 siblings, 0 replies; 9+ messages in thread From: Breno Leitao @ 2026-04-24 14:36 UTC (permalink / raw) To: Andrew Morton Cc: David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan, Catalin Marinas, linux-kernel, linux-mm, linux-kselftest, kernel-team Hello Andrew, On Fri, Apr 24, 2026 at 06:53:25AM -0700, Andrew Morton wrote: > On Tue, 21 Apr 2026 06:45:03 -0700 Breno Leitao <leitao@debian.org> wrote: > > > I am starting to run with kmemleak in verbose enabled in some "probe > > points" across the my employers fleet so that suspected leaks land in > > dmesg without needing a separate read of /sys/kernel/debug/kmemleak. > > > > The downside is that workloads which leak many objects from a single > > allocation site flood the console with byte-for-byte identical > > backtraces. Hundreds of duplicates per scan are common, drowning out > > distinct leaks and unrelated kernel messages, while adding no signal > > beyond the first occurrence. > > > > This series collapses those duplicates inside kmemleak itself. Each > > unique stackdepot trace_handle prints once per scan, followed by a > > short summary line when more than one object shares it: > > AI review: > https://sashiko.dev/#/patchset/20260421-kmemleak_dedup-v1-0-65e31c6cdf0c@debian.org V2 will have them addressed. Here are some of the answers for the question raised by Sashiko. > Can print_unreferenced() access freed memory here and in the fallback > path above? Since the lock is dropped and reacquired, do we need to > re-check object->flags & OBJECT_ALLOCATED before printing? v2 introduces print_leak_locked(), which re-acquires object->lock and gates the hex dump on OBJECT_ALLOCATED: static void print_leak_locked(struct kmemleak_object *object, bool hex_dump) { raw_spin_lock_irq(&object->lock); __print_unreferenced(NULL, object, hex_dump && (object->flags & OBJECT_ALLOCATED)); raw_spin_unlock_irq(&object->lock); } hex_dump_object() is the only path that reads object->pointer's user memory; the rest of the report (backtrace, comm/pid/jiffies, checksum) lives in the kmemleak_object metadata, which get_object() keeps alive. __delete_object() clears OBJECT_ALLOCATED under object->lock before the user memory goes away, so the recheck is sufficient. > If get_object(object) failed, it means the object's reference count is > already 0 and it is actively being deleted. Unconditionally locking and > dumping it there seems like it will read freed memory. Fixed in v2 by reordering: get_object() is now attempted before xa_store(), and on failure we simply skip the object — the leak count was already incremented, and the memory has been freed concurrently so it's no longer a leak. > What happens to valid memory leaks that failed to record a stack trace (e.g. > due to memory pressure or context limits)? Will these leaks also be > permanently ignored in all future scans? Also fixed in v2. dedup_record() now starts with: if (!trace_handle) { print_leak_locked(object, true); return; } so leaks with trace_handle == NULL (early-boot allocations tracked before kmemleak_init() set up object_cache, or stack_depot_save() failures under memory pressure) are printed inline through the same locked. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-04-24 14:36 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-21 13:45 [PATCH 0/2] mm/kmemleak: dedupe verbose scan output Breno Leitao 2026-04-21 13:45 ` [PATCH 1/2] mm/kmemleak: dedupe verbose scan output by allocation backtrace Breno Leitao 2026-04-23 14:29 ` Catalin Marinas 2026-04-24 9:26 ` Breno Leitao 2026-04-24 12:05 ` Catalin Marinas 2026-04-24 12:43 ` Breno Leitao 2026-04-21 13:45 ` [PATCH 2/2] selftests/mm: add kmemleak verbose dedup test Breno Leitao 2026-04-24 13:53 ` [PATCH 0/2] mm/kmemleak: dedupe verbose scan output Andrew Morton 2026-04-24 14:36 ` Breno Leitao
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox