[PATCH v2] mm/kmemleak: avoid soft lockup when scanning task stacks

Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2] mm/kmemleak: avoid soft lockup when scanning task stacks
@ 2026-06-12 15:16 Breno Leitao
  2026-06-12 16:52 ` Lance Yang
  2026-06-12 17:11 ` Catalin Marinas
  0 siblings, 2 replies; 3+ messages in thread
From: Breno Leitao @ 2026-06-12 15:16 UTC (permalink / raw)
  To: Catalin Marinas, Andrew Morton, lance.yang, Davidlohr Bueso,
	Oleg Nesterov, Qian Cai
  Cc: linux-mm, linux-kernel, kernel-team, stable, Breno Leitao

kmemleak_scan() walks every thread and scans its kernel stack under a
single rcu_read_lock() with no reschedule point. On a host with very
many threads -- amplified by KASAN/lockdep in debug builds -- this loop
can hog a CPU long enough to trip the soft lockup watchdog:

  watchdog: BUG: soft lockup - CPU#35 stuck for 22s! [kmemleak:537]
   scan_block
   kmemleak_scan
   kmemleak_scan_thread
   kthread

A cond_resched() cannot be added directly: the loop runs inside an RCU
read-side critical section.

Borrow the rcu_lock_break() pattern from kernel/hung_task.c: when a
reschedule is needed, pin the two iteration cursors, drop the RCU read
lock, cond_resched(), then re-acquire it and continue only if both
cursors are still hashed.

If a cursor was unhashed while the lock was dropped, the thread list
cannot be walked further, so the round is aborted. Such a round scans
only part of the task stacks, which would make live objects look
unreferenced, so reuse the existing "scan interrupted" path to skip
reporting; the next full scan reports the real leaks.

Fixes: c4b28963fd79 ("mm/kmemleak: rely on rcu for task stack scanning")
Cc: stable@vger.kernel.org
Signed-off-by: Breno Leitao <leitao@debian.org>
---
Changes in v2:
- Do not create the nasty array, but use the same pattern as
  kernel/hung_task.c.
- Link to v1: https://lore.kernel.org/r/20260611-kmemleak-stack-resched-v1-1-d6248ade5f4a@debian.org
---
 mm/kmemleak.c | 42 ++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 40 insertions(+), 2 deletions(-)

diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 7c7ba17ce7af0..d88274dc0c605 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -1695,6 +1695,32 @@ static void kmemleak_cond_resched(struct kmemleak_object *object)
 	put_object(object);
 }
 
+/*
+ * Briefly drop the RCU read lock to reschedule during the task stack scan.
+ * Both cursors are pinned across the gap; return false if either one was
+ * unhashed meanwhile, so the caller stops this round instead of walking a
+ * stale list.
+ */
+static bool kmemleak_stack_scan_break(struct task_struct *g,
+				      struct task_struct *p)
+{
+	bool can_cont;
+
+	get_task_struct(g);
+	get_task_struct(p);
+
+	rcu_read_unlock();
+	cond_resched();
+	rcu_read_lock();
+
+	can_cont = pid_alive(g) && pid_alive(p);
+
+	put_task_struct(p);
+	put_task_struct(g);
+
+	return can_cont;
+}
+
 /*
  * Print one leak inline. The hex dump is gated on OBJECT_ALLOCATED so it
  * does not touch user memory that was freed concurrently; the rest of the
@@ -1804,6 +1830,7 @@ static void kmemleak_scan(void)
 	int __maybe_unused i;
 	struct xarray dedup;
 	int new_leaks = 0;
+	bool aborted = false;
 
 	jiffies_last_scan = jiffies;
 
@@ -1890,11 +1917,21 @@ static void kmemleak_scan(void)
 		rcu_read_lock();
 		for_each_process_thread(g, p) {
 			void *stack = try_get_task_stack(p);
+
 			if (stack) {
 				scan_block(stack, stack + THREAD_SIZE, NULL);
 				put_task_stack(p);
 			}
+			/*
+			 * This is an expensive loop, we must to call the
+			 * scheduler to avoid lockups
+			 */
+			if (need_resched() && !kmemleak_stack_scan_break(g, p)) {
+				aborted = true;
+				goto unlock;
+			}
 		}
+unlock:
 		rcu_read_unlock();
 	}
 
@@ -1937,9 +1974,10 @@ static void kmemleak_scan(void)
 	scan_gray_list();
 
 	/*
-	 * If scanning was stopped do not report any new unreferenced objects.
+	 * If scanning was stopped or a stack scan round was aborted, do not
+	 * report any new unreferenced objects.
 	 */
-	if (scan_should_stop())
+	if (scan_should_stop() || aborted)
 		return;
 
 	/*

---
base-commit: abe651837cb394f76d738a7a747322fca3bf17ba
change-id: 20260611-kmemleak-stack-resched-01ed72858a7f

Best regards,
-- 
Breno Leitao <leitao@debian.org>



^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] mm/kmemleak: avoid soft lockup when scanning task stacks
  2026-06-12 15:16 [PATCH v2] mm/kmemleak: avoid soft lockup when scanning task stacks Breno Leitao
@ 2026-06-12 16:52 ` Lance Yang
  2026-06-12 17:11 ` Catalin Marinas
  1 sibling, 0 replies; 3+ messages in thread
From: Lance Yang @ 2026-06-12 16:52 UTC (permalink / raw)
  To: leitao
  Cc: catalin.marinas, akpm, lance.yang, dave, oleg, cai, linux-mm,
	linux-kernel, kernel-team, stable


On Fri, Jun 12, 2026 at 08:16:07AM -0700, Breno Leitao wrote:
>kmemleak_scan() walks every thread and scans its kernel stack under a
>single rcu_read_lock() with no reschedule point. On a host with very
>many threads -- amplified by KASAN/lockdep in debug builds -- this loop
>can hog a CPU long enough to trip the soft lockup watchdog:
>
>  watchdog: BUG: soft lockup - CPU#35 stuck for 22s! [kmemleak:537]
>   scan_block
>   kmemleak_scan
>   kmemleak_scan_thread
>   kthread
>
>A cond_resched() cannot be added directly: the loop runs inside an RCU
>read-side critical section.
>
>Borrow the rcu_lock_break() pattern from kernel/hung_task.c: when a
>reschedule is needed, pin the two iteration cursors, drop the RCU read
>lock, cond_resched(), then re-acquire it and continue only if both
>cursors are still hashed.
>
>If a cursor was unhashed while the lock was dropped, the thread list
>cannot be walked further, so the round is aborted. Such a round scans
>only part of the task stacks, which would make live objects look
>unreferenced, so reuse the existing "scan interrupted" path to skip
>reporting; the next full scan reports the real leaks.

TBH, a bit dense to me as written ...

>Fixes: c4b28963fd79 ("mm/kmemleak: rely on rcu for task stack scanning")
>Cc: stable@vger.kernel.org
>Signed-off-by: Breno Leitao <leitao@debian.org>
>---
>Changes in v2:
>- Do not create the nasty array, but use the same pattern as
>  kernel/hung_task.c.
>- Link to v1: https://lore.kernel.org/r/20260611-kmemleak-stack-resched-v1-1-d6248ade5f4a@debian.org
>---
> mm/kmemleak.c | 42 ++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 40 insertions(+), 2 deletions(-)
>
>diff --git a/mm/kmemleak.c b/mm/kmemleak.c
>index 7c7ba17ce7af0..d88274dc0c605 100644
>--- a/mm/kmemleak.c
>+++ b/mm/kmemleak.c
>@@ -1695,6 +1695,32 @@ static void kmemleak_cond_resched(struct kmemleak_object *object)
> 	put_object(object);
> }
> 
>+/*
>+ * Briefly drop the RCU read lock to reschedule during the task stack scan.
>+ * Both cursors are pinned across the gap; return false if either one was
>+ * unhashed meanwhile, so the caller stops this round instead of walking a
>+ * stale list.
>+ */

Personally, looks a bit clunky to me with "gap" and "unhashed" ...

Maybe:

"
Drop RCU long enough to reschedule during task stack scanning. Keep both
cursors alive while RCU is dropped; return false if either cursor can no
longer continue the walk.
"

>+static bool kmemleak_stack_scan_break(struct task_struct *g,
>+				      struct task_struct *p)
>+{
>+	bool can_cont;
>+
>+	get_task_struct(g);
>+	get_task_struct(p);
>+
>+	rcu_read_unlock();
>+	cond_resched();
>+	rcu_read_lock();
>+
>+	can_cont = pid_alive(g) && pid_alive(p);
>+
>+	put_task_struct(p);
>+	put_task_struct(g);
>+
>+	return can_cont;
>+}
>+
> /*
>  * Print one leak inline. The hex dump is gated on OBJECT_ALLOCATED so it
>  * does not touch user memory that was freed concurrently; the rest of the
>@@ -1804,6 +1830,7 @@ static void kmemleak_scan(void)
> 	int __maybe_unused i;
> 	struct xarray dedup;
> 	int new_leaks = 0;
>+	bool aborted = false;
> 
> 	jiffies_last_scan = jiffies;
> 
>@@ -1890,11 +1917,21 @@ static void kmemleak_scan(void)
> 		rcu_read_lock();
> 		for_each_process_thread(g, p) {
> 			void *stack = try_get_task_stack(p);
>+
> 			if (stack) {
> 				scan_block(stack, stack + THREAD_SIZE, NULL);
> 				put_task_stack(p);
> 			}
>+			/*
>+			 * This is an expensive loop, we must to call the
>+			 * scheduler to avoid lockups
>+			 */

need_resched() plus the helper name already says most of it. Maybe just:

"
Break the RCU read-side section before rescheduling.
"

>+			if (need_resched() && !kmemleak_stack_scan_break(g, p)) {
>+				aborted = true;
>+				goto unlock;
>+			}
> 		}
>+unlock:
> 		rcu_read_unlock();
> 	}
> 
>@@ -1937,9 +1974,10 @@ static void kmemleak_scan(void)
> 	scan_gray_list();
> 
> 	/*
>-	 * If scanning was stopped do not report any new unreferenced objects.
>+	 * If scanning was stopped or a stack scan round was aborted, do not
>+	 * report any new unreferenced objects.
> 	 */

Maybe just say "stack root scan was incomplete" here? That's the actual
reason we skip reporting.

"
If scanning was stopped or the stack root scan was incomplete, do not
report any new unreferenced objects.
"

>-	if (scan_should_stop())
>+	if (scan_should_stop() || aborted)
> 		return;
> 
> 	/*
>
>---

Apart from that, feel free to add:

Acked-by: Lance Yang <lance.yang@linux.dev>


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] mm/kmemleak: avoid soft lockup when scanning task stacks
  2026-06-12 15:16 [PATCH v2] mm/kmemleak: avoid soft lockup when scanning task stacks Breno Leitao
  2026-06-12 16:52 ` Lance Yang
@ 2026-06-12 17:11 ` Catalin Marinas
  1 sibling, 0 replies; 3+ messages in thread
From: Catalin Marinas @ 2026-06-12 17:11 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Andrew Morton, lance.yang, Davidlohr Bueso, Oleg Nesterov,
	Qian Cai, linux-mm, linux-kernel, kernel-team, stable

Hi Breno,

Thanks for addressing this long-standing soft lockup problem.

On Fri, Jun 12, 2026 at 08:16:07AM -0700, Breno Leitao wrote:
> +/*
> + * Briefly drop the RCU read lock to reschedule during the task stack scan.
> + * Both cursors are pinned across the gap; return false if either one was
> + * unhashed meanwhile, so the caller stops this round instead of walking a
> + * stale list.
> + */
> +static bool kmemleak_stack_scan_break(struct task_struct *g,
> +				      struct task_struct *p)
> +{
> +	bool can_cont;
> +
> +	get_task_struct(g);
> +	get_task_struct(p);
> +
> +	rcu_read_unlock();
> +	cond_resched();
> +	rcu_read_lock();
> +
> +	can_cont = pid_alive(g) && pid_alive(p);
> +
> +	put_task_struct(p);
> +	put_task_struct(g);
> +
> +	return can_cont;
> +}

While this matches rcu_lock_break(), it looks to me like we rely too
much on the internals of kernel/exit.c. Ideally this function should be
provided as an API alongside for_each_process_thread() so that we only
have the idiom in one place in case something changes in the future.

Yet anther variant below, untested. Basically, it follows the
next_tgid() or task_seq_get_next() approach (we might as well move this
to a separate function to avoid excessive indentation):

	if (kmemleak_stack_scan) {
		struct pid *pid;
		int nr = 1;

		do {
			struct task_struct *p = NULL;

			rcu_read_lock();
			pid = find_ge_pid(nr, &init_pid_ns);
			if (pid) {
				nr = pid_nr(pid) + 1;
				p = pid_task(pid, PIDTYPE_PID);
				if (p)
					get_task_struct(p);
			}
			rcu_read_unlock();

			if (p) {
				void *stack = try_get_task_stack(p);

				if (stack) {
					scan_block(stack, stack + THREAD_SIZE,
							NULL);
					put_task_stack(p);
				}
				put_task_struct(p);
			}
			cond_resched();
		} while (pid);
	}

-- 
Catalin


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-06-12 17:11 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-12 15:16 [PATCH v2] mm/kmemleak: avoid soft lockup when scanning task stacks Breno Leitao
2026-06-12 16:52 ` Lance Yang
2026-06-12 17:11 ` Catalin Marinas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox