* [PATCH v2] mm/kmemleak: avoid soft lockup when scanning task stacks
@ 2026-06-12 15:16 Breno Leitao
2026-06-12 16:52 ` Lance Yang
2026-06-12 17:11 ` Catalin Marinas
0 siblings, 2 replies; 3+ messages in thread
From: Breno Leitao @ 2026-06-12 15:16 UTC (permalink / raw)
To: Catalin Marinas, Andrew Morton, lance.yang, Davidlohr Bueso,
Oleg Nesterov, Qian Cai
Cc: linux-mm, linux-kernel, kernel-team, stable, Breno Leitao
kmemleak_scan() walks every thread and scans its kernel stack under a
single rcu_read_lock() with no reschedule point. On a host with very
many threads -- amplified by KASAN/lockdep in debug builds -- this loop
can hog a CPU long enough to trip the soft lockup watchdog:
watchdog: BUG: soft lockup - CPU#35 stuck for 22s! [kmemleak:537]
scan_block
kmemleak_scan
kmemleak_scan_thread
kthread
A cond_resched() cannot be added directly: the loop runs inside an RCU
read-side critical section.
Borrow the rcu_lock_break() pattern from kernel/hung_task.c: when a
reschedule is needed, pin the two iteration cursors, drop the RCU read
lock, cond_resched(), then re-acquire it and continue only if both
cursors are still hashed.
If a cursor was unhashed while the lock was dropped, the thread list
cannot be walked further, so the round is aborted. Such a round scans
only part of the task stacks, which would make live objects look
unreferenced, so reuse the existing "scan interrupted" path to skip
reporting; the next full scan reports the real leaks.
Fixes: c4b28963fd79 ("mm/kmemleak: rely on rcu for task stack scanning")
Cc: stable@vger.kernel.org
Signed-off-by: Breno Leitao <leitao@debian.org>
---
Changes in v2:
- Do not create the nasty array, but use the same pattern as
kernel/hung_task.c.
- Link to v1: https://lore.kernel.org/r/20260611-kmemleak-stack-resched-v1-1-d6248ade5f4a@debian.org
---
mm/kmemleak.c | 42 ++++++++++++++++++++++++++++++++++++++++--
1 file changed, 40 insertions(+), 2 deletions(-)
diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 7c7ba17ce7af0..d88274dc0c605 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -1695,6 +1695,32 @@ static void kmemleak_cond_resched(struct kmemleak_object *object)
put_object(object);
}
+/*
+ * Briefly drop the RCU read lock to reschedule during the task stack scan.
+ * Both cursors are pinned across the gap; return false if either one was
+ * unhashed meanwhile, so the caller stops this round instead of walking a
+ * stale list.
+ */
+static bool kmemleak_stack_scan_break(struct task_struct *g,
+ struct task_struct *p)
+{
+ bool can_cont;
+
+ get_task_struct(g);
+ get_task_struct(p);
+
+ rcu_read_unlock();
+ cond_resched();
+ rcu_read_lock();
+
+ can_cont = pid_alive(g) && pid_alive(p);
+
+ put_task_struct(p);
+ put_task_struct(g);
+
+ return can_cont;
+}
+
/*
* Print one leak inline. The hex dump is gated on OBJECT_ALLOCATED so it
* does not touch user memory that was freed concurrently; the rest of the
@@ -1804,6 +1830,7 @@ static void kmemleak_scan(void)
int __maybe_unused i;
struct xarray dedup;
int new_leaks = 0;
+ bool aborted = false;
jiffies_last_scan = jiffies;
@@ -1890,11 +1917,21 @@ static void kmemleak_scan(void)
rcu_read_lock();
for_each_process_thread(g, p) {
void *stack = try_get_task_stack(p);
+
if (stack) {
scan_block(stack, stack + THREAD_SIZE, NULL);
put_task_stack(p);
}
+ /*
+ * This is an expensive loop, we must to call the
+ * scheduler to avoid lockups
+ */
+ if (need_resched() && !kmemleak_stack_scan_break(g, p)) {
+ aborted = true;
+ goto unlock;
+ }
}
+unlock:
rcu_read_unlock();
}
@@ -1937,9 +1974,10 @@ static void kmemleak_scan(void)
scan_gray_list();
/*
- * If scanning was stopped do not report any new unreferenced objects.
+ * If scanning was stopped or a stack scan round was aborted, do not
+ * report any new unreferenced objects.
*/
- if (scan_should_stop())
+ if (scan_should_stop() || aborted)
return;
/*
---
base-commit: abe651837cb394f76d738a7a747322fca3bf17ba
change-id: 20260611-kmemleak-stack-resched-01ed72858a7f
Best regards,
--
Breno Leitao <leitao@debian.org>
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH v2] mm/kmemleak: avoid soft lockup when scanning task stacks
2026-06-12 15:16 [PATCH v2] mm/kmemleak: avoid soft lockup when scanning task stacks Breno Leitao
@ 2026-06-12 16:52 ` Lance Yang
2026-06-12 17:11 ` Catalin Marinas
1 sibling, 0 replies; 3+ messages in thread
From: Lance Yang @ 2026-06-12 16:52 UTC (permalink / raw)
To: leitao
Cc: catalin.marinas, akpm, lance.yang, dave, oleg, cai, linux-mm,
linux-kernel, kernel-team, stable
On Fri, Jun 12, 2026 at 08:16:07AM -0700, Breno Leitao wrote:
>kmemleak_scan() walks every thread and scans its kernel stack under a
>single rcu_read_lock() with no reschedule point. On a host with very
>many threads -- amplified by KASAN/lockdep in debug builds -- this loop
>can hog a CPU long enough to trip the soft lockup watchdog:
>
> watchdog: BUG: soft lockup - CPU#35 stuck for 22s! [kmemleak:537]
> scan_block
> kmemleak_scan
> kmemleak_scan_thread
> kthread
>
>A cond_resched() cannot be added directly: the loop runs inside an RCU
>read-side critical section.
>
>Borrow the rcu_lock_break() pattern from kernel/hung_task.c: when a
>reschedule is needed, pin the two iteration cursors, drop the RCU read
>lock, cond_resched(), then re-acquire it and continue only if both
>cursors are still hashed.
>
>If a cursor was unhashed while the lock was dropped, the thread list
>cannot be walked further, so the round is aborted. Such a round scans
>only part of the task stacks, which would make live objects look
>unreferenced, so reuse the existing "scan interrupted" path to skip
>reporting; the next full scan reports the real leaks.
TBH, a bit dense to me as written ...
>Fixes: c4b28963fd79 ("mm/kmemleak: rely on rcu for task stack scanning")
>Cc: stable@vger.kernel.org
>Signed-off-by: Breno Leitao <leitao@debian.org>
>---
>Changes in v2:
>- Do not create the nasty array, but use the same pattern as
> kernel/hung_task.c.
>- Link to v1: https://lore.kernel.org/r/20260611-kmemleak-stack-resched-v1-1-d6248ade5f4a@debian.org
>---
> mm/kmemleak.c | 42 ++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 40 insertions(+), 2 deletions(-)
>
>diff --git a/mm/kmemleak.c b/mm/kmemleak.c
>index 7c7ba17ce7af0..d88274dc0c605 100644
>--- a/mm/kmemleak.c
>+++ b/mm/kmemleak.c
>@@ -1695,6 +1695,32 @@ static void kmemleak_cond_resched(struct kmemleak_object *object)
> put_object(object);
> }
>
>+/*
>+ * Briefly drop the RCU read lock to reschedule during the task stack scan.
>+ * Both cursors are pinned across the gap; return false if either one was
>+ * unhashed meanwhile, so the caller stops this round instead of walking a
>+ * stale list.
>+ */
Personally, looks a bit clunky to me with "gap" and "unhashed" ...
Maybe:
"
Drop RCU long enough to reschedule during task stack scanning. Keep both
cursors alive while RCU is dropped; return false if either cursor can no
longer continue the walk.
"
>+static bool kmemleak_stack_scan_break(struct task_struct *g,
>+ struct task_struct *p)
>+{
>+ bool can_cont;
>+
>+ get_task_struct(g);
>+ get_task_struct(p);
>+
>+ rcu_read_unlock();
>+ cond_resched();
>+ rcu_read_lock();
>+
>+ can_cont = pid_alive(g) && pid_alive(p);
>+
>+ put_task_struct(p);
>+ put_task_struct(g);
>+
>+ return can_cont;
>+}
>+
> /*
> * Print one leak inline. The hex dump is gated on OBJECT_ALLOCATED so it
> * does not touch user memory that was freed concurrently; the rest of the
>@@ -1804,6 +1830,7 @@ static void kmemleak_scan(void)
> int __maybe_unused i;
> struct xarray dedup;
> int new_leaks = 0;
>+ bool aborted = false;
>
> jiffies_last_scan = jiffies;
>
>@@ -1890,11 +1917,21 @@ static void kmemleak_scan(void)
> rcu_read_lock();
> for_each_process_thread(g, p) {
> void *stack = try_get_task_stack(p);
>+
> if (stack) {
> scan_block(stack, stack + THREAD_SIZE, NULL);
> put_task_stack(p);
> }
>+ /*
>+ * This is an expensive loop, we must to call the
>+ * scheduler to avoid lockups
>+ */
need_resched() plus the helper name already says most of it. Maybe just:
"
Break the RCU read-side section before rescheduling.
"
>+ if (need_resched() && !kmemleak_stack_scan_break(g, p)) {
>+ aborted = true;
>+ goto unlock;
>+ }
> }
>+unlock:
> rcu_read_unlock();
> }
>
>@@ -1937,9 +1974,10 @@ static void kmemleak_scan(void)
> scan_gray_list();
>
> /*
>- * If scanning was stopped do not report any new unreferenced objects.
>+ * If scanning was stopped or a stack scan round was aborted, do not
>+ * report any new unreferenced objects.
> */
Maybe just say "stack root scan was incomplete" here? That's the actual
reason we skip reporting.
"
If scanning was stopped or the stack root scan was incomplete, do not
report any new unreferenced objects.
"
>- if (scan_should_stop())
>+ if (scan_should_stop() || aborted)
> return;
>
> /*
>
>---
Apart from that, feel free to add:
Acked-by: Lance Yang <lance.yang@linux.dev>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v2] mm/kmemleak: avoid soft lockup when scanning task stacks
2026-06-12 15:16 [PATCH v2] mm/kmemleak: avoid soft lockup when scanning task stacks Breno Leitao
2026-06-12 16:52 ` Lance Yang
@ 2026-06-12 17:11 ` Catalin Marinas
1 sibling, 0 replies; 3+ messages in thread
From: Catalin Marinas @ 2026-06-12 17:11 UTC (permalink / raw)
To: Breno Leitao
Cc: Andrew Morton, lance.yang, Davidlohr Bueso, Oleg Nesterov,
Qian Cai, linux-mm, linux-kernel, kernel-team, stable
Hi Breno,
Thanks for addressing this long-standing soft lockup problem.
On Fri, Jun 12, 2026 at 08:16:07AM -0700, Breno Leitao wrote:
> +/*
> + * Briefly drop the RCU read lock to reschedule during the task stack scan.
> + * Both cursors are pinned across the gap; return false if either one was
> + * unhashed meanwhile, so the caller stops this round instead of walking a
> + * stale list.
> + */
> +static bool kmemleak_stack_scan_break(struct task_struct *g,
> + struct task_struct *p)
> +{
> + bool can_cont;
> +
> + get_task_struct(g);
> + get_task_struct(p);
> +
> + rcu_read_unlock();
> + cond_resched();
> + rcu_read_lock();
> +
> + can_cont = pid_alive(g) && pid_alive(p);
> +
> + put_task_struct(p);
> + put_task_struct(g);
> +
> + return can_cont;
> +}
While this matches rcu_lock_break(), it looks to me like we rely too
much on the internals of kernel/exit.c. Ideally this function should be
provided as an API alongside for_each_process_thread() so that we only
have the idiom in one place in case something changes in the future.
Yet anther variant below, untested. Basically, it follows the
next_tgid() or task_seq_get_next() approach (we might as well move this
to a separate function to avoid excessive indentation):
if (kmemleak_stack_scan) {
struct pid *pid;
int nr = 1;
do {
struct task_struct *p = NULL;
rcu_read_lock();
pid = find_ge_pid(nr, &init_pid_ns);
if (pid) {
nr = pid_nr(pid) + 1;
p = pid_task(pid, PIDTYPE_PID);
if (p)
get_task_struct(p);
}
rcu_read_unlock();
if (p) {
void *stack = try_get_task_stack(p);
if (stack) {
scan_block(stack, stack + THREAD_SIZE,
NULL);
put_task_stack(p);
}
put_task_struct(p);
}
cond_resched();
} while (pid);
}
--
Catalin
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-06-12 17:11 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-12 15:16 [PATCH v2] mm/kmemleak: avoid soft lockup when scanning task stacks Breno Leitao
2026-06-12 16:52 ` Lance Yang
2026-06-12 17:11 ` Catalin Marinas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox