The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* [PATCH v3 0/3] mm/kmemleak: avoid soft lockup when scanning task stacks
@ 2026-06-15 17:49 Breno Leitao
  2026-06-15 17:49 ` [PATCH v3 1/3] " Breno Leitao
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Breno Leitao @ 2026-06-15 17:49 UTC (permalink / raw)
  To: Catalin Marinas, Andrew Morton, lance.yang, Davidlohr Bueso,
	Oleg Nesterov, Qian Cai
  Cc: oleg, sj, linux-mm, linux-kernel, Breno Leitao, kernel-team,
	stable

kmemleak_scan() scans every task stack under one rcu_read_lock() with no
reschedule point, which can trip the soft lockup watchdog on hosts with
very many threads.

That prints the following message, depending on the workload+host
configuration:

      watchdog: BUG: soft lockup - CPU#35 stuck for 22s! [kmemleak:537]
       scan_block
       kmemleak_scan
       kmemleak_scan_thread
       kthread

Patch 1 walks the tasks with find_ge_pid() so the scan reschedules between
tasks

Patches 2-3 let the scan loops stop early once a scan is interrupted.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
Changes in v3:
- Rework the task stack walk to use find_ge_pid() instead of v1's array
  and v2's rcu_lock_break() helper (Catalin).
- Add two follow-up patches letting scan_block() report an interrupted
  scan so the scan loops stop early.
- Link to v2: https://lore.kernel.org/r/20260612-kmemleak-stack-resched-v2-1-53240de79e88@debian.org

Changes in v2:
- Do not create the nasty array, but use the same pattern as
  kernel/hung_task.c.
- Link to v1: https://lore.kernel.org/r/20260611-kmemleak-stack-resched-v1-1-d6248ade5f4a@debian.org

---
Breno Leitao (3):
      mm/kmemleak: avoid soft lockup when scanning task stacks
      mm/kmemleak: stop the task stack scan early when interrupted
      mm/kmemleak: stop the per-cpu and struct page scans early too

 mm/kmemleak.c | 88 +++++++++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 65 insertions(+), 23 deletions(-)
---
base-commit: abe651837cb394f76d738a7a747322fca3bf17ba
change-id: 20260611-kmemleak-stack-resched-01ed72858a7f

Best regards,
-- 
Breno Leitao <leitao@debian.org>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v3 1/3] mm/kmemleak: avoid soft lockup when scanning task stacks
  2026-06-15 17:49 [PATCH v3 0/3] mm/kmemleak: avoid soft lockup when scanning task stacks Breno Leitao
@ 2026-06-15 17:49 ` Breno Leitao
  2026-06-15 18:24   ` Catalin Marinas
  2026-06-15 18:46   ` Davidlohr Bueso
  2026-06-15 17:49 ` [PATCH v3 2/3] mm/kmemleak: stop the task stack scan early when interrupted Breno Leitao
  2026-06-15 17:49 ` [PATCH v3 3/3] mm/kmemleak: stop the per-cpu and struct page scans early too Breno Leitao
  2 siblings, 2 replies; 8+ messages in thread
From: Breno Leitao @ 2026-06-15 17:49 UTC (permalink / raw)
  To: Catalin Marinas, Andrew Morton, lance.yang, Davidlohr Bueso,
	Oleg Nesterov, Qian Cai
  Cc: oleg, sj, linux-mm, linux-kernel, Breno Leitao, kernel-team,
	stable

kmemleak_scan() walks every thread and scans its kernel stack under a
single rcu_read_lock() with no reschedule point. On a host with very
many threads -- amplified by KASAN/lockdep in debug builds -- this loop
can hog a CPU long enough to trip the soft lockup watchdog:

  watchdog: BUG: soft lockup - CPU#35 stuck for 22s! [kmemleak:537]
   scan_block
   kmemleak_scan
   kmemleak_scan_thread
   kthread

A cond_resched() cannot be added directly: the loop runs inside an RCU
read-side critical section.

Walk the tasks one PID at a time with find_ge_pid(), taking the RCU read
lock only to look up and pin each task. The stack is then scanned with no
lock held, so cond_resched() runs between tasks and the scan stops early
on scan_should_stop(). This follows the next_tgid()/task_seq_get_next()
iteration pattern and keeps each RCU critical section short.

Fixes: c4b28963fd79 ("mm/kmemleak: rely on rcu for task stack scanning")
Cc: stable@vger.kernel.org
Signed-off-by: Breno Leitao <leitao@debian.org>
---
 mm/kmemleak.c | 51 ++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 38 insertions(+), 13 deletions(-)

diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 7c7ba17ce7af0..a7786b6bc174e 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -1695,6 +1695,42 @@ static void kmemleak_cond_resched(struct kmemleak_object *object)
 	put_object(object);
 }
 
+/*
+ * Scan all task kernel stacks, rescheduling between tasks. Each task is looked
+ * up and pinned within its own RCU read-side section, so no lock is held across
+ * the scan and the walk cannot trip the soft lockup watchdog.
+ */
+static void kmemleak_scan_task_stacks(void)
+{
+	struct pid *pid;
+	int nr = 1;
+
+	do {
+		struct task_struct *p = NULL;
+
+		rcu_read_lock();
+		pid = find_ge_pid(nr, &init_pid_ns);
+		if (pid) {
+			nr = pid_nr(pid) + 1;
+			p = pid_task(pid, PIDTYPE_PID);
+			if (p)
+				get_task_struct(p);
+		}
+		rcu_read_unlock();
+
+		if (p) {
+			void *stack = try_get_task_stack(p);
+
+			if (stack) {
+				scan_block(stack, stack + THREAD_SIZE, NULL);
+				put_task_stack(p);
+			}
+			put_task_struct(p);
+		}
+		cond_resched();
+	} while (pid && !scan_should_stop());
+}
+
 /*
  * Print one leak inline. The hex dump is gated on OBJECT_ALLOCATED so it
  * does not touch user memory that was freed concurrently; the rest of the
@@ -1884,19 +1920,8 @@ static void kmemleak_scan(void)
 	/*
 	 * Scanning the task stacks (may introduce false negatives).
 	 */
-	if (kmemleak_stack_scan) {
-		struct task_struct *p, *g;
-
-		rcu_read_lock();
-		for_each_process_thread(g, p) {
-			void *stack = try_get_task_stack(p);
-			if (stack) {
-				scan_block(stack, stack + THREAD_SIZE, NULL);
-				put_task_stack(p);
-			}
-		}
-		rcu_read_unlock();
-	}
+	if (kmemleak_stack_scan)
+		kmemleak_scan_task_stacks();
 
 	/*
 	 * Scan the objects already referenced from the sections scanned

-- 
2.53.0-Meta


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 2/3] mm/kmemleak: stop the task stack scan early when interrupted
  2026-06-15 17:49 [PATCH v3 0/3] mm/kmemleak: avoid soft lockup when scanning task stacks Breno Leitao
  2026-06-15 17:49 ` [PATCH v3 1/3] " Breno Leitao
@ 2026-06-15 17:49 ` Breno Leitao
  2026-06-15 18:26   ` Catalin Marinas
  2026-06-15 17:49 ` [PATCH v3 3/3] mm/kmemleak: stop the per-cpu and struct page scans early too Breno Leitao
  2 siblings, 1 reply; 8+ messages in thread
From: Breno Leitao @ 2026-06-15 17:49 UTC (permalink / raw)
  To: Catalin Marinas, Andrew Morton, lance.yang, Davidlohr Bueso,
	Oleg Nesterov, Qian Cai
  Cc: oleg, sj, linux-mm, linux-kernel, Breno Leitao, kernel-team

scan_block() already checks scan_should_stop() for every pointer and
bails out of the current block, but the task stack walk cannot tell and
keeps issuing a separate scan_should_stop() between every task.

Return that status from scan_block() and use it as the task stack loop
condition, so the walk stops as soon as a scan is interrupted.

Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
---
 mm/kmemleak.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index a7786b6bc174e..916af7cecb3b4 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -1524,22 +1524,25 @@ static int scan_should_stop(void)
 
 /*
  * Scan a memory block (exclusive range) for valid pointers and add those
- * found to the gray list.
+ * found to the gray list. Return non-zero if the scan was interrupted.
  */
-static void scan_block(void *_start, void *_end,
-		       struct kmemleak_object *scanned)
+static int scan_block(void *_start, void *_end,
+		      struct kmemleak_object *scanned)
 {
 	unsigned long *ptr;
 	unsigned long *start = PTR_ALIGN(_start, BYTES_PER_POINTER);
 	unsigned long *end = _end - (BYTES_PER_POINTER - 1);
 	unsigned long flags;
+	int stop = 0;
 
 	raw_spin_lock_irqsave(&kmemleak_lock, flags);
 	for (ptr = start; ptr < end; ptr++) {
 		unsigned long pointer;
 
-		if (scan_should_stop())
+		if (scan_should_stop()) {
+			stop = 1;
 			break;
+		}
 
 		kasan_disable_current();
 		pointer = *(unsigned long *)kasan_reset_tag((void *)ptr);
@@ -1549,6 +1552,8 @@ static void scan_block(void *_start, void *_end,
 		pointer_update_refs(scanned, pointer, OBJECT_PERCPU);
 	}
 	raw_spin_unlock_irqrestore(&kmemleak_lock, flags);
+
+	return stop;
 }
 
 /*
@@ -1704,6 +1709,7 @@ static void kmemleak_scan_task_stacks(void)
 {
 	struct pid *pid;
 	int nr = 1;
+	int stop = 0;
 
 	do {
 		struct task_struct *p = NULL;
@@ -1722,13 +1728,13 @@ static void kmemleak_scan_task_stacks(void)
 			void *stack = try_get_task_stack(p);
 
 			if (stack) {
-				scan_block(stack, stack + THREAD_SIZE, NULL);
+				stop = scan_block(stack, stack + THREAD_SIZE, NULL);
 				put_task_stack(p);
 			}
 			put_task_struct(p);
 		}
 		cond_resched();
-	} while (pid && !scan_should_stop());
+	} while (pid && !stop);
 }
 
 /*

-- 
2.53.0-Meta


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 3/3] mm/kmemleak: stop the per-cpu and struct page scans early too
  2026-06-15 17:49 [PATCH v3 0/3] mm/kmemleak: avoid soft lockup when scanning task stacks Breno Leitao
  2026-06-15 17:49 ` [PATCH v3 1/3] " Breno Leitao
  2026-06-15 17:49 ` [PATCH v3 2/3] mm/kmemleak: stop the task stack scan early when interrupted Breno Leitao
@ 2026-06-15 17:49 ` Breno Leitao
  2026-06-15 18:27   ` Catalin Marinas
  2 siblings, 1 reply; 8+ messages in thread
From: Breno Leitao @ 2026-06-15 17:49 UTC (permalink / raw)
  To: Catalin Marinas, Andrew Morton, lance.yang, Davidlohr Bueso,
	Oleg Nesterov, Qian Cai
  Cc: oleg, sj, linux-mm, linux-kernel, Breno Leitao, kernel-team

The per-cpu and struct page scan loops have no reschedule-stop check of
their own: once a scan is interrupted they keep calling scan_block() for
every remaining block, which scans nothing useful.

Propagate scan_block()'s interrupted status through scan_large_block()
and break both loops as soon as it is set.

Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
---
 mm/kmemleak.c | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 916af7cecb3b4..7fa7124727e36 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -1558,18 +1558,22 @@ static int scan_block(void *_start, void *_end,
 
 /*
  * Scan a large memory block in MAX_SCAN_SIZE chunks to reduce the latency.
+ * Return non-zero if the scan was interrupted.
  */
 #ifdef CONFIG_SMP
-static void scan_large_block(void *start, void *end)
+static int scan_large_block(void *start, void *end)
 {
 	void *next;
 
 	while (start < end) {
 		next = min(start + MAX_SCAN_SIZE, end);
-		scan_block(start, next, NULL);
+		if (scan_block(start, next, NULL))
+			return 1;
 		start = next;
 		cond_resched();
 	}
+
+	return 0;
 }
 #endif
 
@@ -1889,9 +1893,11 @@ static void kmemleak_scan(void)
 
 #ifdef CONFIG_SMP
 	/* per-cpu sections scanning */
-	for_each_possible_cpu(i)
-		scan_large_block(__per_cpu_start + per_cpu_offset(i),
-				 __per_cpu_end + per_cpu_offset(i));
+	for_each_possible_cpu(i) {
+		if (scan_large_block(__per_cpu_start + per_cpu_offset(i),
+				     __per_cpu_end + per_cpu_offset(i)))
+			break;
+	}
 #endif
 
 	/*
@@ -1902,6 +1908,7 @@ static void kmemleak_scan(void)
 		unsigned long start_pfn = zone->zone_start_pfn;
 		unsigned long end_pfn = zone_end_pfn(zone);
 		unsigned long pfn;
+		int stop = 0;
 
 		for (pfn = start_pfn; pfn < end_pfn; pfn++) {
 			struct page *page = pfn_to_online_page(pfn);
@@ -1918,8 +1925,12 @@ static void kmemleak_scan(void)
 			/* only scan if page is in use */
 			if (page_count(page) == 0)
 				continue;
-			scan_block(page, page + 1, NULL);
+			stop = scan_block(page, page + 1, NULL);
+			if (stop)
+				break;
 		}
+		if (stop)
+			break;
 	}
 	put_online_mems();
 

-- 
2.53.0-Meta


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 1/3] mm/kmemleak: avoid soft lockup when scanning task stacks
  2026-06-15 17:49 ` [PATCH v3 1/3] " Breno Leitao
@ 2026-06-15 18:24   ` Catalin Marinas
  2026-06-15 18:46   ` Davidlohr Bueso
  1 sibling, 0 replies; 8+ messages in thread
From: Catalin Marinas @ 2026-06-15 18:24 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Andrew Morton, lance.yang, Davidlohr Bueso, Oleg Nesterov,
	Qian Cai, sj, linux-mm, linux-kernel, kernel-team, stable

On Mon, Jun 15, 2026 at 10:49:06AM -0700, Breno Leitao wrote:
> kmemleak_scan() walks every thread and scans its kernel stack under a
> single rcu_read_lock() with no reschedule point. On a host with very
> many threads -- amplified by KASAN/lockdep in debug builds -- this loop
> can hog a CPU long enough to trip the soft lockup watchdog:
> 
>   watchdog: BUG: soft lockup - CPU#35 stuck for 22s! [kmemleak:537]
>    scan_block
>    kmemleak_scan
>    kmemleak_scan_thread
>    kthread
> 
> A cond_resched() cannot be added directly: the loop runs inside an RCU
> read-side critical section.
> 
> Walk the tasks one PID at a time with find_ge_pid(), taking the RCU read
> lock only to look up and pin each task. The stack is then scanned with no
> lock held, so cond_resched() runs between tasks and the scan stops early
> on scan_should_stop(). This follows the next_tgid()/task_seq_get_next()
> iteration pattern and keeps each RCU critical section short.
> 
> Fixes: c4b28963fd79 ("mm/kmemleak: rely on rcu for task stack scanning")
> Cc: stable@vger.kernel.org
> Signed-off-by: Breno Leitao <leitao@debian.org>

I think the Fixes is just a marker to tell how far back to go. Before
the above commit, we used a read_lock(&tasklist_lock) which probably had
similar issues.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

Thanks.

-- 
Catalin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 2/3] mm/kmemleak: stop the task stack scan early when interrupted
  2026-06-15 17:49 ` [PATCH v3 2/3] mm/kmemleak: stop the task stack scan early when interrupted Breno Leitao
@ 2026-06-15 18:26   ` Catalin Marinas
  0 siblings, 0 replies; 8+ messages in thread
From: Catalin Marinas @ 2026-06-15 18:26 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Andrew Morton, lance.yang, Davidlohr Bueso, Oleg Nesterov, sj,
	linux-mm, linux-kernel, kernel-team

On Mon, Jun 15, 2026 at 10:49:07AM -0700, Breno Leitao wrote:
> scan_block() already checks scan_should_stop() for every pointer and
> bails out of the current block, but the task stack walk cannot tell and
> keeps issuing a separate scan_should_stop() between every task.
> 
> Return that status from scan_block() and use it as the task stack loop
> condition, so the walk stops as soon as a scan is interrupted.
> 
> Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Breno Leitao <leitao@debian.org>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 3/3] mm/kmemleak: stop the per-cpu and struct page scans early too
  2026-06-15 17:49 ` [PATCH v3 3/3] mm/kmemleak: stop the per-cpu and struct page scans early too Breno Leitao
@ 2026-06-15 18:27   ` Catalin Marinas
  0 siblings, 0 replies; 8+ messages in thread
From: Catalin Marinas @ 2026-06-15 18:27 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Andrew Morton, lance.yang, Davidlohr Bueso, Oleg Nesterov, sj,
	linux-mm, linux-kernel, kernel-team

On Mon, Jun 15, 2026 at 10:49:08AM -0700, Breno Leitao wrote:
> The per-cpu and struct page scan loops have no reschedule-stop check of
> their own: once a scan is interrupted they keep calling scan_block() for
> every remaining block, which scans nothing useful.
> 
> Propagate scan_block()'s interrupted status through scan_large_block()
> and break both loops as soon as it is set.
> 
> Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Breno Leitao <leitao@debian.org>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 1/3] mm/kmemleak: avoid soft lockup when scanning task stacks
  2026-06-15 17:49 ` [PATCH v3 1/3] " Breno Leitao
  2026-06-15 18:24   ` Catalin Marinas
@ 2026-06-15 18:46   ` Davidlohr Bueso
  1 sibling, 0 replies; 8+ messages in thread
From: Davidlohr Bueso @ 2026-06-15 18:46 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Catalin Marinas, Andrew Morton, lance.yang, Oleg Nesterov,
	Qian Cai, sj, linux-mm, linux-kernel, kernel-team, stable

On Mon, 15 Jun 2026, Breno Leitao wrote:

>kmemleak_scan() walks every thread and scans its kernel stack under a
>single rcu_read_lock() with no reschedule point. On a host with very
>many threads -- amplified by KASAN/lockdep in debug builds -- this loop
>can hog a CPU long enough to trip the soft lockup watchdog:
>
>  watchdog: BUG: soft lockup - CPU#35 stuck for 22s! [kmemleak:537]
>   scan_block
>   kmemleak_scan
>   kmemleak_scan_thread
>   kthread
>
>A cond_resched() cannot be added directly: the loop runs inside an RCU
>read-side critical section.
>
>Walk the tasks one PID at a time with find_ge_pid(), taking the RCU read
>lock only to look up and pin each task. The stack is then scanned with no
>lock held, so cond_resched() runs between tasks and the scan stops early
>on scan_should_stop(). This follows the next_tgid()/task_seq_get_next()
>iteration pattern and keeps each RCU critical section short.
>
>Fixes: c4b28963fd79 ("mm/kmemleak: rely on rcu for task stack scanning")
>Cc: stable@vger.kernel.org
>Signed-off-by: Breno Leitao <leitao@debian.org>

LGTM

Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-06-15 18:56 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-15 17:49 [PATCH v3 0/3] mm/kmemleak: avoid soft lockup when scanning task stacks Breno Leitao
2026-06-15 17:49 ` [PATCH v3 1/3] " Breno Leitao
2026-06-15 18:24   ` Catalin Marinas
2026-06-15 18:46   ` Davidlohr Bueso
2026-06-15 17:49 ` [PATCH v3 2/3] mm/kmemleak: stop the task stack scan early when interrupted Breno Leitao
2026-06-15 18:26   ` Catalin Marinas
2026-06-15 17:49 ` [PATCH v3 3/3] mm/kmemleak: stop the per-cpu and struct page scans early too Breno Leitao
2026-06-15 18:27   ` Catalin Marinas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox