The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* [PATCH v2 0/3] workqueue: improve stall diagnostics for pools with no running worker
@ 2026-06-30 16:15 Breno Leitao
  2026-06-30 16:15 ` [PATCH v2 1/3] workqueue: only show running workers in stall diagnostics Breno Leitao
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Breno Leitao @ 2026-06-30 16:15 UTC (permalink / raw)
  To: Tejun Heo, Lai Jiangshan, Song Liu
  Cc: linux-kernel, pmladek, marco.crivellari, david.dai, Breno Leitao,
	kernel-team

The workqueue watchdog fires when a pool stops making progress, but the
diagnostics it printed did not always point at the culprit.

Commit 8823eaef45da7 ("workqueue: Show all busy workers in stall
diagnostics") made the watchdog dump every in-flight worker in the
stalled pool's busy_hash, including workers that are not running on the
CPU.  As Petr Mladek pointed out, that is rarely useful: a worker that is
merely sleeping inside a work item does not, by itself, hold up the pool
-- the scheduler calls wq_worker_sleeping() and the pool wakes (or forks)
another worker to keep going.  Dumping all of those sleeping workers
mostly adds noise and can do more harm than good.

The condition actually worth reporting is the opposite one: a pool that
is stalled with *no* running worker at all.  That means the pool failed
to get a worker onto the CPU -- it could not wake an idle worker, could
not fork a new one, or the CPU is busy running something that is not a
workqueue worker.  In that case the previous code printed an empty
backtrace section and gave no hint about what went wrong.

Following Petr's suggestion, this series reworks the diagnostics:

  1) workqueue: only show running workers in stall diagnostics
     Restore the task_is_running() filter (reverting the behaviour of
     8823eaef45da7) so only running workers are dumped, and explicitly
     report when a stalled pool has no running worker, printing the pool
     id, CPU, idle state and worker counts.

  2) workqueue: trigger a single-CPU backtrace for stalled pools
     Trigger a backtrace of the stalled CPU so whatever task is actually
     occupying it -- workqueue worker or not -- is captured.

  3) workqueue: dump the last woken worker for stalled pools
     Record the worker last woken by kick_pool() and dump its backtrace.
     It is the prime suspect: the idle worker kicked to take over when
     the previous running worker went to sleep.

The reworked diagnostics have been running on the Meta fleet (backported
to 6.16) and finally explained a long-standing arm64 stall that the old
output could not: an EFI runtime call wedging an efi_rts_wq worker on a
machine without NMI.  The single-CPU backtrace from patch 2 pinpointed
efi_call_rts() as the stuck task [1].

Example output, reproduced with the in-tree stall detector sample
(samples/workqueue/stall_detector/wq_stall.c); the lines added by this
series are marked "<-- new":

  BUG: workqueue lockup - pool cpus=2 node=0 flags=0x0 nice=0 stuck for 30s!
  Showing busy workqueues and worker pools:
  workqueue events: flags=0x100
    pwq 10: cpus=2 node=0 flags=0x0 nice=0 active=5 refcnt=6
      in-flight: 58:stall_work1_fn [wq_stall] for 30s
      pending: stall_work2_fn [wq_stall], ...
  pool 10: cpus=2 node=0 flags=0x0 nice=0 hung=30s workers=2 idle: 33
  Showing backtraces of busy workers in stalled worker pools:
  pool 10: no worker in running state, cpu=2 is idle (nr_workers=2 nr_idle=1)   <-- new
  The pool might have trouble waking an idle worker.                            <-- new
  Backtrace of last woken worker:                                              <-- new
  task:kworker/2:1     state:I  pid:58                                          <-- new
   __schedule+0x8fd/0xfc0                                                       <-- new
   stall_work1_fn+0xb2/0x100 [wq_stall]                                         <-- new
   process_scheduled_works+0x254/0x4e0                                          <-- new
   worker_thread+0x222/0x340                                                    <-- new
  Sending NMI from CPU 0 to CPUs 2:                                             <-- new
  NMI backtrace for cpu 2  (idle: default_idle / do_idle)                       <-- new

[1] https://lore.kernel.org/all/20260616-efi_timeout-v3-0-76dd1d26657b@debian.org/

Signed-off-by: Breno Leitao <leitao@debian.org>
---
Changes in v2:
- EDITME: describe what is new in this series revision.
- EDITME: use bulletpoints and terse descriptions.
- Link to v1: https://lore.kernel.org/r/20260616-wq_dump_petr-v1-0-b57473ca6d18@debian.org

---
Breno Leitao (3):
      workqueue: only show running workers in stall diagnostics
      workqueue: trigger a single-CPU backtrace for stalled pools
      workqueue: dump the last woken worker for stalled pools

 kernel/workqueue.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 71 insertions(+), 5 deletions(-)
---
base-commit: 7de6ae9e12207ec146f2f3f1e58d1a99317e88bc
change-id: 20260616-wq_dump_petr-7fcf43940204

Best regards,
-- 
Breno Leitao <leitao@debian.org>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 1/3] workqueue: only show running workers in stall diagnostics
  2026-06-30 16:15 [PATCH v2 0/3] workqueue: improve stall diagnostics for pools with no running worker Breno Leitao
@ 2026-06-30 16:15 ` Breno Leitao
  2026-06-30 16:15 ` [PATCH v2 2/3] workqueue: trigger a single-CPU backtrace for stalled pools Breno Leitao
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Breno Leitao @ 2026-06-30 16:15 UTC (permalink / raw)
  To: Tejun Heo, Lai Jiangshan, Song Liu
  Cc: linux-kernel, pmladek, marco.crivellari, david.dai, Breno Leitao,
	kernel-team

show_cpu_pool_busy_workers() dumps every in-flight worker in the pool's
busy_hash, including workers that are not currently running on the CPU.
Restore the task_is_running() filter so only running workers are dumped.

When no running worker is found the pool may be stuck, unable to wake an
idle worker to process pending work, and the watchdog would otherwise
give no feedback.  Add show_pool_no_running_worker() to report the pool
id, CPU, idle state, and worker counts in that case.

The pool info message is printed inside pool->lock using
printk_deferred_enter/exit, the same pattern used by the existing
busy-worker loop, to avoid deadlocks with console drivers that queue
work while holding locks also taken in their write paths.

This has been running on the Meta fleet for a while and caught some real
issues, for instance EFI stalls stalling the workqueue [1].

Link: https://lore.kernel.org/all/20260616-efi_timeout-v3-0-76dd1d26657b@debian.org/ [1]
Suggested-by: Petr Mladek <pmladek@suse.com>
Fixes: 8823eaef45da7 ("workqueue: Show all busy workers in stall diagnostics")
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
---
 kernel/workqueue.c | 38 ++++++++++++++++++++++++++++++++++----
 1 file changed, 34 insertions(+), 4 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 78f25afb4a9d6..efbac160b7628 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -7693,13 +7693,31 @@ module_param_named(panic_on_stall_time, wq_panic_on_stall_time, uint, 0644);
 MODULE_PARM_DESC(panic_on_stall_time, "Panic if stall exceeds this many seconds (0=disabled)");
 
 /*
- * Show workers that might prevent the processing of pending work items.
- * A busy worker that is not running on the CPU (e.g. sleeping in
- * wait_event_idle() with PF_WQ_WORKER cleared) can stall the pool just as
- * effectively as a CPU-bound one, so dump every in-flight worker.
+ * Report that a pool has no worker in running state, which is a sign that the
+ * pool may be stuck. Print pool info. Must be called with pool->lock held and
+ * inside a printk_deferred_enter/exit region.
+ */
+static void show_pool_no_running_worker(struct worker_pool *pool)
+{
+	lockdep_assert_held(&pool->lock);
+
+	printk_deferred_enter();
+	pr_info("pool %d: no worker in running state, cpu=%d is %s (nr_workers=%d nr_idle=%d)\n",
+		pool->id, pool->cpu,
+		idle_cpu(pool->cpu) ? "idle" : "busy",
+		pool->nr_workers, pool->nr_idle);
+	pr_info("The pool might have trouble waking an idle worker.\n");
+	printk_deferred_exit();
+}
+
+/*
+ * Show running workers that might prevent the processing of pending work items.
+ * If no running worker is found, the pool may be stuck waiting for an idle
+ * worker to be woken, so report the pool state.
  */
 static void show_cpu_pool_busy_workers(struct worker_pool *pool)
 {
+	bool found_running = false;
 	struct worker *worker;
 	unsigned long irq_flags;
 	int bkt;
@@ -7707,6 +7725,11 @@ static void show_cpu_pool_busy_workers(struct worker_pool *pool)
 	raw_spin_lock_irqsave(&pool->lock, irq_flags);
 
 	hash_for_each(pool->busy_hash, bkt, worker, hentry) {
+		/* Skip workers that are not actively running on the CPU. */
+		if (!task_is_running(worker->task))
+			continue;
+
+		found_running = true;
 		/*
 		 * Defer printing to avoid deadlocks in console
 		 * drivers that queue work while holding locks
@@ -7720,6 +7743,13 @@ static void show_cpu_pool_busy_workers(struct worker_pool *pool)
 		printk_deferred_exit();
 	}
 
+	/*
+	 * If no running worker was found, the pool is likely stuck. Print pool
+	 * state.
+	 */
+	if (!found_running)
+		show_pool_no_running_worker(pool);
+
 	raw_spin_unlock_irqrestore(&pool->lock, irq_flags);
 }
 

-- 
2.53.0-Meta


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 2/3] workqueue: trigger a single-CPU backtrace for stalled pools
  2026-06-30 16:15 [PATCH v2 0/3] workqueue: improve stall diagnostics for pools with no running worker Breno Leitao
  2026-06-30 16:15 ` [PATCH v2 1/3] workqueue: only show running workers in stall diagnostics Breno Leitao
@ 2026-06-30 16:15 ` Breno Leitao
  2026-06-30 16:15 ` [PATCH v2 3/3] workqueue: dump the last woken worker " Breno Leitao
  2026-06-30 16:54 ` [PATCH v2 0/3] workqueue: improve stall diagnostics for pools with no running worker Tejun Heo
  3 siblings, 0 replies; 6+ messages in thread
From: Breno Leitao @ 2026-06-30 16:15 UTC (permalink / raw)
  To: Tejun Heo, Lai Jiangshan, Song Liu
  Cc: linux-kernel, pmladek, marco.crivellari, david.dai, Breno Leitao,
	kernel-team

When a CPU pool is stalled with no running worker, the task occupying the
CPU may not be a workqueue worker at all.  Trigger a single-CPU backtrace
for the stalled CPU to capture what it is currently executing.

The CPU is snapshotted under pool->lock and the backtrace is triggered
after releasing the lock to avoid any potential issues with NMI delivery.

Skip the backtrace when the CPU is offline.  A pool disassociated by CPU
hotplug keeps its pool->cpu, and an NMI to an offline CPU is never acked,
so nmi_trigger_cpumask_backtrace() would busy-wait for its full timeout
in the watchdog's timer context.

Suggested-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
---
 kernel/workqueue.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index efbac160b7628..7d30e23c84087 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -7720,10 +7720,13 @@ static void show_cpu_pool_busy_workers(struct worker_pool *pool)
 	bool found_running = false;
 	struct worker *worker;
 	unsigned long irq_flags;
-	int bkt;
+	int cpu, bkt;
 
 	raw_spin_lock_irqsave(&pool->lock, irq_flags);
 
+	/* Snapshot cpu inside the lock to safely use it after unlock. */
+	cpu = pool->cpu;
+
 	hash_for_each(pool->busy_hash, bkt, worker, hentry) {
 		/* Skip workers that are not actively running on the CPU. */
 		if (!task_is_running(worker->task))
@@ -7751,6 +7754,15 @@ static void show_cpu_pool_busy_workers(struct worker_pool *pool)
 		show_pool_no_running_worker(pool);
 
 	raw_spin_unlock_irqrestore(&pool->lock, irq_flags);
+
+	/*
+	 * Trigger a backtrace on the stalled CPU to capture what it is
+	 * currently executing. Skip an offline CPU, whose NMI is never acked
+	 * and would make the backtrace busy-wait until it times out. Done
+	 * after releasing the lock to avoid issues with NMI delivery.
+	 */
+	if (!found_running && cpu_online(cpu))
+		trigger_single_cpu_backtrace(cpu);
 }
 
 static void show_cpu_pools_busy_workers(void)

-- 
2.53.0-Meta


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 3/3] workqueue: dump the last woken worker for stalled pools
  2026-06-30 16:15 [PATCH v2 0/3] workqueue: improve stall diagnostics for pools with no running worker Breno Leitao
  2026-06-30 16:15 ` [PATCH v2 1/3] workqueue: only show running workers in stall diagnostics Breno Leitao
  2026-06-30 16:15 ` [PATCH v2 2/3] workqueue: trigger a single-CPU backtrace for stalled pools Breno Leitao
@ 2026-06-30 16:15 ` Breno Leitao
  2026-06-30 16:54 ` [PATCH v2 0/3] workqueue: improve stall diagnostics for pools with no running worker Tejun Heo
  3 siblings, 0 replies; 6+ messages in thread
From: Breno Leitao @ 2026-06-30 16:15 UTC (permalink / raw)
  To: Tejun Heo, Lai Jiangshan, Song Liu
  Cc: linux-kernel, pmladek, marco.crivellari, david.dai, Breno Leitao,
	kernel-team

To identify the task most likely responsible for a stall, add
last_woken_worker (L: pool->lock) to worker_pool and record it in
kick_pool() just before wake_up_process().  This captures the idle
worker that was kicked to take over when the last running worker went to
sleep; if the pool is now stuck with no running worker, that task is the
prime suspect and its backtrace is dumped by show_pool_no_running_worker().

Using struct worker * rather than struct task_struct * avoids any
lifetime concern: workers are only destroyed via set_worker_dying()
which requires pool->lock, and set_worker_dying() clears
last_woken_worker when the dying worker matches.
show_cpu_pool_busy_workers() holds pool->lock while calling
sched_show_task(), so last_woken_worker is either NULL or points to a
live worker with a valid task.  More precisely, set_worker_dying() clears
last_woken_worker before setting WORKER_DIE, so a non-NULL
last_woken_worker means the kthread has not yet exited and worker->task
is still alive.

Suggested-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
---
 kernel/workqueue.c | 28 ++++++++++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 7d30e23c84087..3580c19150721 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -226,6 +226,7 @@ struct worker_pool {
 						/* L: hash of busy workers */
 
 	struct worker		*manager;	/* L: purely informational */
+	struct worker		*last_woken_worker; /* L: last worker woken by kick_pool() */
 	struct list_head	workers;	/* A: attached workers */
 
 	struct ida		worker_ida;	/* worker IDs for task name */
@@ -1310,6 +1311,9 @@ static bool kick_pool(struct worker_pool *pool)
 		}
 	}
 #endif
+	/* Track the last idle worker woken, used for stall diagnostics. */
+	pool->last_woken_worker = worker;
+
 	wake_up_process(p);
 	return true;
 }
@@ -2948,6 +2952,13 @@ static void set_worker_dying(struct worker *worker, struct list_head *list)
 	pool->nr_workers--;
 	pool->nr_idle--;
 
+	/*
+	 * Clear last_woken_worker if it points to this worker, so that
+	 * show_cpu_pool_busy_workers() cannot dereference a freed worker.
+	 */
+	if (pool->last_woken_worker == worker)
+		pool->last_woken_worker = NULL;
+
 	worker->flags |= WORKER_DIE;
 
 	list_move(&worker->entry, list);
@@ -7707,13 +7718,25 @@ static void show_pool_no_running_worker(struct worker_pool *pool)
 		idle_cpu(pool->cpu) ? "idle" : "busy",
 		pool->nr_workers, pool->nr_idle);
 	pr_info("The pool might have trouble waking an idle worker.\n");
+	/*
+	 * last_woken_worker and its task are valid here: set_worker_dying()
+	 * clears it under pool->lock before setting WORKER_DIE, so if
+	 * last_woken_worker is non-NULL the kthread has not yet exited and
+	 * worker->task is still alive.
+	 */
+	if (pool->last_woken_worker) {
+		pr_info("Backtrace of last woken worker:\n");
+		sched_show_task(pool->last_woken_worker->task);
+	} else {
+		pr_info("Last woken worker empty\n");
+	}
 	printk_deferred_exit();
 }
 
 /*
  * Show running workers that might prevent the processing of pending work items.
  * If no running worker is found, the pool may be stuck waiting for an idle
- * worker to be woken, so report the pool state.
+ * worker to be woken, so report the pool state and the last woken worker.
  */
 static void show_cpu_pool_busy_workers(struct worker_pool *pool)
 {
@@ -7748,7 +7771,8 @@ static void show_cpu_pool_busy_workers(struct worker_pool *pool)
 
 	/*
 	 * If no running worker was found, the pool is likely stuck. Print pool
-	 * state.
+	 * state and the backtrace of the last woken worker, which is the prime
+	 * suspect for the stall.
 	 */
 	if (!found_running)
 		show_pool_no_running_worker(pool);

-- 
2.53.0-Meta


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 0/3] workqueue: improve stall diagnostics for pools with no running worker
  2026-06-30 16:15 [PATCH v2 0/3] workqueue: improve stall diagnostics for pools with no running worker Breno Leitao
                   ` (2 preceding siblings ...)
  2026-06-30 16:15 ` [PATCH v2 3/3] workqueue: dump the last woken worker " Breno Leitao
@ 2026-06-30 16:54 ` Tejun Heo
  2026-06-30 16:59   ` Breno Leitao
  3 siblings, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2026-06-30 16:54 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Lai Jiangshan, Song Liu, linux-kernel, Petr Mladek,
	marco.crivellari, david.dai, kernel-team

Hello, Breno.

Applied 1-2 to wq/for-7.3.

3 doesn't apply on for-7.3. Can you rebase and resend?

Thanks.
--
tejun

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 0/3] workqueue: improve stall diagnostics for pools with no running worker
  2026-06-30 16:54 ` [PATCH v2 0/3] workqueue: improve stall diagnostics for pools with no running worker Tejun Heo
@ 2026-06-30 16:59   ` Breno Leitao
  0 siblings, 0 replies; 6+ messages in thread
From: Breno Leitao @ 2026-06-30 16:59 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Lai Jiangshan, Song Liu, linux-kernel, Petr Mladek,
	marco.crivellari, david.dai, kernel-team

Hello Tejun,

On Tue, Jun 30, 2026 at 06:54:21AM -1000, Tejun Heo wrote:
> Hello, Breno.
> 
> Applied 1-2 to wq/for-7.3.

Thanks

> 3 doesn't apply on for-7.3. Can you rebase and resend?

Damn, I am sorry, my fault.

I will rebase and resend.
Thanks!

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-06-30 16:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-30 16:15 [PATCH v2 0/3] workqueue: improve stall diagnostics for pools with no running worker Breno Leitao
2026-06-30 16:15 ` [PATCH v2 1/3] workqueue: only show running workers in stall diagnostics Breno Leitao
2026-06-30 16:15 ` [PATCH v2 2/3] workqueue: trigger a single-CPU backtrace for stalled pools Breno Leitao
2026-06-30 16:15 ` [PATCH v2 3/3] workqueue: dump the last woken worker " Breno Leitao
2026-06-30 16:54 ` [PATCH v2 0/3] workqueue: improve stall diagnostics for pools with no running worker Tejun Heo
2026-06-30 16:59   ` Breno Leitao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox