* [PATCH RFC 1/3] workqueue: only show running workers in stall diagnostics
2026-06-16 16:44 [PATCH RFC 0/3] workqueue: improve stall diagnostics for pools with no running worker Breno Leitao
@ 2026-06-16 16:44 ` Breno Leitao
2026-06-19 12:58 ` Petr Mladek
2026-06-16 16:44 ` [PATCH RFC 2/3] workqueue: trigger a single-CPU backtrace for stalled pools Breno Leitao
2026-06-16 16:44 ` [PATCH RFC 3/3] workqueue: dump the last woken worker " Breno Leitao
2 siblings, 1 reply; 7+ messages in thread
From: Breno Leitao @ 2026-06-16 16:44 UTC (permalink / raw)
To: Tejun Heo, Lai Jiangshan, Song Liu
Cc: linux-kernel, pmladek, Breno Leitao, kernel-team
show_cpu_pool_busy_workers() dumps every in-flight worker in the pool's
busy_hash, including workers that are not currently running on the CPU.
Restore the task_is_running() filter so only running workers are dumped.
When no running worker is found the pool may be stuck, unable to wake an
idle worker to process pending work, and the watchdog would otherwise
give no feedback. Add show_pool_no_running_worker() to report the pool
id, CPU, idle state, and worker counts in that case.
The pool info message is printed inside pool->lock using
printk_deferred_enter/exit, the same pattern used by the existing
busy-worker loop, to avoid deadlocks with console drivers that queue
work while holding locks also taken in their write paths.
This has been running on the Meta fleet for a while and caught some real
issues, for instance EFI stalls stalling the workqueue [1].
Link: https://lore.kernel.org/all/20260616-efi_timeout-v3-0-76dd1d26657b@debian.org/ [1]
Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
Fixes: 8823eaef45da7 ("workqueue: Show all busy workers in stall diagnostics")
---
kernel/workqueue.c | 38 ++++++++++++++++++++++++++++++++++----
1 file changed, 34 insertions(+), 4 deletions(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 78f25afb4a9d6..efbac160b7628 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -7693,13 +7693,31 @@ module_param_named(panic_on_stall_time, wq_panic_on_stall_time, uint, 0644);
MODULE_PARM_DESC(panic_on_stall_time, "Panic if stall exceeds this many seconds (0=disabled)");
/*
- * Show workers that might prevent the processing of pending work items.
- * A busy worker that is not running on the CPU (e.g. sleeping in
- * wait_event_idle() with PF_WQ_WORKER cleared) can stall the pool just as
- * effectively as a CPU-bound one, so dump every in-flight worker.
+ * Report that a pool has no worker in running state, which is a sign that the
+ * pool may be stuck. Print pool info. Must be called with pool->lock held and
+ * inside a printk_deferred_enter/exit region.
+ */
+static void show_pool_no_running_worker(struct worker_pool *pool)
+{
+ lockdep_assert_held(&pool->lock);
+
+ printk_deferred_enter();
+ pr_info("pool %d: no worker in running state, cpu=%d is %s (nr_workers=%d nr_idle=%d)\n",
+ pool->id, pool->cpu,
+ idle_cpu(pool->cpu) ? "idle" : "busy",
+ pool->nr_workers, pool->nr_idle);
+ pr_info("The pool might have trouble waking an idle worker.\n");
+ printk_deferred_exit();
+}
+
+/*
+ * Show running workers that might prevent the processing of pending work items.
+ * If no running worker is found, the pool may be stuck waiting for an idle
+ * worker to be woken, so report the pool state.
*/
static void show_cpu_pool_busy_workers(struct worker_pool *pool)
{
+ bool found_running = false;
struct worker *worker;
unsigned long irq_flags;
int bkt;
@@ -7707,6 +7725,11 @@ static void show_cpu_pool_busy_workers(struct worker_pool *pool)
raw_spin_lock_irqsave(&pool->lock, irq_flags);
hash_for_each(pool->busy_hash, bkt, worker, hentry) {
+ /* Skip workers that are not actively running on the CPU. */
+ if (!task_is_running(worker->task))
+ continue;
+
+ found_running = true;
/*
* Defer printing to avoid deadlocks in console
* drivers that queue work while holding locks
@@ -7720,6 +7743,13 @@ static void show_cpu_pool_busy_workers(struct worker_pool *pool)
printk_deferred_exit();
}
+ /*
+ * If no running worker was found, the pool is likely stuck. Print pool
+ * state.
+ */
+ if (!found_running)
+ show_pool_no_running_worker(pool);
+
raw_spin_unlock_irqrestore(&pool->lock, irq_flags);
}
--
2.53.0-Meta
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH RFC 1/3] workqueue: only show running workers in stall diagnostics
2026-06-16 16:44 ` [PATCH RFC 1/3] workqueue: only show running workers in stall diagnostics Breno Leitao
@ 2026-06-19 12:58 ` Petr Mladek
0 siblings, 0 replies; 7+ messages in thread
From: Petr Mladek @ 2026-06-19 12:58 UTC (permalink / raw)
To: Breno Leitao
Cc: Tejun Heo, Lai Jiangshan, Song Liu, linux-kernel, kernel-team
On Tue 2026-06-16 09:44:39, Breno Leitao wrote:
> show_cpu_pool_busy_workers() dumps every in-flight worker in the pool's
> busy_hash, including workers that are not currently running on the CPU.
> Restore the task_is_running() filter so only running workers are dumped.
>
> When no running worker is found the pool may be stuck, unable to wake an
> idle worker to process pending work, and the watchdog would otherwise
> give no feedback. Add show_pool_no_running_worker() to report the pool
> id, CPU, idle state, and worker counts in that case.
>
> The pool info message is printed inside pool->lock using
> printk_deferred_enter/exit, the same pattern used by the existing
> busy-worker loop, to avoid deadlocks with console drivers that queue
> work while holding locks also taken in their write paths.
>
> This has been running on the Meta fleet for a while and caught some real
> issues, for instance EFI stalls stalling the workqueue [1].
>
> Link: https://lore.kernel.org/all/20260616-efi_timeout-v3-0-76dd1d26657b@debian.org/ [1]
> Suggested-by: Petr Mladek <pmladek@suse.com>
> Signed-off-by: Breno Leitao <leitao@debian.org>
> Fixes: 8823eaef45da7 ("workqueue: Show all busy workers in stall diagnostics")
It looks good to me. And it is good to know that it helped in
the real life.
Reviewed-by: Petr Mladek <pmladek@suse.com>
Best Regards,
Petr
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH RFC 2/3] workqueue: trigger a single-CPU backtrace for stalled pools
2026-06-16 16:44 [PATCH RFC 0/3] workqueue: improve stall diagnostics for pools with no running worker Breno Leitao
2026-06-16 16:44 ` [PATCH RFC 1/3] workqueue: only show running workers in stall diagnostics Breno Leitao
@ 2026-06-16 16:44 ` Breno Leitao
2026-06-19 13:42 ` Petr Mladek
2026-06-16 16:44 ` [PATCH RFC 3/3] workqueue: dump the last woken worker " Breno Leitao
2 siblings, 1 reply; 7+ messages in thread
From: Breno Leitao @ 2026-06-16 16:44 UTC (permalink / raw)
To: Tejun Heo, Lai Jiangshan, Song Liu
Cc: linux-kernel, pmladek, Breno Leitao, kernel-team
When a CPU pool is stalled with no running worker, the task occupying the
CPU may not be a workqueue worker at all. Trigger a single-CPU backtrace
for the stalled CPU to capture what it is currently executing.
The CPU is snapshotted under pool->lock and the backtrace is triggered
after releasing the lock to avoid any potential issues with NMI delivery.
Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
---
kernel/workqueue.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index efbac160b7628..db6287cd39588 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -7720,10 +7720,13 @@ static void show_cpu_pool_busy_workers(struct worker_pool *pool)
bool found_running = false;
struct worker *worker;
unsigned long irq_flags;
- int bkt;
+ int cpu, bkt;
raw_spin_lock_irqsave(&pool->lock, irq_flags);
+ /* Snapshot cpu inside the lock to safely use it after unlock. */
+ cpu = pool->cpu;
+
hash_for_each(pool->busy_hash, bkt, worker, hentry) {
/* Skip workers that are not actively running on the CPU. */
if (!task_is_running(worker->task))
@@ -7751,6 +7754,14 @@ static void show_cpu_pool_busy_workers(struct worker_pool *pool)
show_pool_no_running_worker(pool);
raw_spin_unlock_irqrestore(&pool->lock, irq_flags);
+
+ /*
+ * Trigger a backtrace on the stalled CPU to capture what it is
+ * currently executing. Called after releasing the lock to avoid
+ * any potential issues with NMI delivery.
+ */
+ if (!found_running)
+ trigger_single_cpu_backtrace(cpu);
}
static void show_cpu_pools_busy_workers(void)
--
2.53.0-Meta
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH RFC 2/3] workqueue: trigger a single-CPU backtrace for stalled pools
2026-06-16 16:44 ` [PATCH RFC 2/3] workqueue: trigger a single-CPU backtrace for stalled pools Breno Leitao
@ 2026-06-19 13:42 ` Petr Mladek
0 siblings, 0 replies; 7+ messages in thread
From: Petr Mladek @ 2026-06-19 13:42 UTC (permalink / raw)
To: Breno Leitao
Cc: Tejun Heo, Lai Jiangshan, Song Liu, linux-kernel, kernel-team
On Tue 2026-06-16 09:44:40, Breno Leitao wrote:
> When a CPU pool is stalled with no running worker, the task occupying the
> CPU may not be a workqueue worker at all. Trigger a single-CPU backtrace
> for the stalled CPU to capture what it is currently executing.
>
> The CPU is snapshotted under pool->lock and the backtrace is triggered
> after releasing the lock to avoid any potential issues with NMI delivery.
>
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -7751,6 +7754,14 @@ static void show_cpu_pool_busy_workers(struct worker_pool *pool)
> show_pool_no_running_worker(pool);
>
> raw_spin_unlock_irqrestore(&pool->lock, irq_flags);
> +
> + /*
> + * Trigger a backtrace on the stalled CPU to capture what it is
> + * currently executing. Called after releasing the lock to avoid
> + * any potential issues with NMI delivery.
> + */
> + if (!found_running)
> + trigger_single_cpu_backtrace(cpu);
> }
Sashiko AI is curious whether this might be racy against CPU hotplug,
see
https://sashiko.dev/#/patchset/20260616-wq_dump_petr-v1-0-b57473ca6d18%40debian.org
<paste Sashiko comment>
Is it possible for this CPU to be offline when we trigger the backtrace?
When a CPU goes offline, its bound workqueue pools are disassociated via
unbind_workers() but retain their original pool->cpu ID. If a work item on
a disassociated pool hangs, the watchdog could detect the stall and invoke
show_cpu_pool_busy_workers().
If the target CPU is offline, it cannot process the NMI or clear its
completion bit in the backtrace mask. Does this cause
nmi_trigger_cpumask_backtrace() to busy-wait for the 10-second timeout
waiting for a response that will never arrive? Since the watchdog executes
in a softirq timer context, this could stall the CPU running the watchdog
for 10 seconds.
Should this check cpu_online(cpu) before triggering the backtrace?
</paste Sashiko comment>
It makes some sense. wq_watchdog_timer_fn() checks either
'per_cpu(wq_watchdog_touched_cpu)' or the global 'wq_watchdog_touched'
depending whether pool->cpu is set or not. And it seems to be wrong
for disassociated pools.
But this seems to be an existing problem which should be fixed
separately.
Also wq_watchdog_timer_fn() is checking the state without taking
proper locks. So it is inherently racy. But it is another story.
Summary:
This particular change looks good to me. And I believe that it
is useful.
Reviewed-by: Petr Mladek <pmladek@suse.com>
Best Regards,
Petr
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH RFC 3/3] workqueue: dump the last woken worker for stalled pools
2026-06-16 16:44 [PATCH RFC 0/3] workqueue: improve stall diagnostics for pools with no running worker Breno Leitao
2026-06-16 16:44 ` [PATCH RFC 1/3] workqueue: only show running workers in stall diagnostics Breno Leitao
2026-06-16 16:44 ` [PATCH RFC 2/3] workqueue: trigger a single-CPU backtrace for stalled pools Breno Leitao
@ 2026-06-16 16:44 ` Breno Leitao
2026-06-19 15:40 ` Petr Mladek
2 siblings, 1 reply; 7+ messages in thread
From: Breno Leitao @ 2026-06-16 16:44 UTC (permalink / raw)
To: Tejun Heo, Lai Jiangshan, Song Liu
Cc: linux-kernel, pmladek, Breno Leitao, kernel-team
To identify the task most likely responsible for a stall, add
last_woken_worker (L: pool->lock) to worker_pool and record it in
kick_pool() just before wake_up_process(). This captures the idle
worker that was kicked to take over when the last running worker went to
sleep; if the pool is now stuck with no running worker, that task is the
prime suspect and its backtrace is dumped by show_pool_no_running_worker().
Using struct worker * rather than struct task_struct * avoids any
lifetime concern: workers are only destroyed via set_worker_dying()
which requires pool->lock, and set_worker_dying() clears
last_woken_worker when the dying worker matches.
show_cpu_pool_busy_workers() holds pool->lock while calling
sched_show_task(), so last_woken_worker is either NULL or points to a
live worker with a valid task. More precisely, set_worker_dying() clears
last_woken_worker before setting WORKER_DIE, so a non-NULL
last_woken_worker means the kthread has not yet exited and worker->task
is still alive.
Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
---
kernel/workqueue.c | 28 ++++++++++++++++++++++++++--
1 file changed, 26 insertions(+), 2 deletions(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index db6287cd39588..6870b765c9ac8 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -226,6 +226,7 @@ struct worker_pool {
/* L: hash of busy workers */
struct worker *manager; /* L: purely informational */
+ struct worker *last_woken_worker; /* L: last worker woken by kick_pool() */
struct list_head workers; /* A: attached workers */
struct ida worker_ida; /* worker IDs for task name */
@@ -1310,6 +1311,9 @@ static bool kick_pool(struct worker_pool *pool)
}
}
#endif
+ /* Track the last idle worker woken, used for stall diagnostics. */
+ pool->last_woken_worker = worker;
+
wake_up_process(p);
return true;
}
@@ -2948,6 +2952,13 @@ static void set_worker_dying(struct worker *worker, struct list_head *list)
pool->nr_workers--;
pool->nr_idle--;
+ /*
+ * Clear last_woken_worker if it points to this worker, so that
+ * show_cpu_pool_busy_workers() cannot dereference a freed worker.
+ */
+ if (pool->last_woken_worker == worker)
+ pool->last_woken_worker = NULL;
+
worker->flags |= WORKER_DIE;
list_move(&worker->entry, list);
@@ -7707,13 +7718,25 @@ static void show_pool_no_running_worker(struct worker_pool *pool)
idle_cpu(pool->cpu) ? "idle" : "busy",
pool->nr_workers, pool->nr_idle);
pr_info("The pool might have trouble waking an idle worker.\n");
+ /*
+ * last_woken_worker and its task are valid here: set_worker_dying()
+ * clears it under pool->lock before setting WORKER_DIE, so if
+ * last_woken_worker is non-NULL the kthread has not yet exited and
+ * worker->task is still alive.
+ */
+ if (pool->last_woken_worker) {
+ pr_info("Backtrace of last woken worker:\n");
+ sched_show_task(pool->last_woken_worker->task);
+ } else {
+ pr_info("Last woken worker empty\n");
+ }
printk_deferred_exit();
}
/*
* Show running workers that might prevent the processing of pending work items.
* If no running worker is found, the pool may be stuck waiting for an idle
- * worker to be woken, so report the pool state.
+ * worker to be woken, so report the pool state and the last woken worker.
*/
static void show_cpu_pool_busy_workers(struct worker_pool *pool)
{
@@ -7748,7 +7771,8 @@ static void show_cpu_pool_busy_workers(struct worker_pool *pool)
/*
* If no running worker was found, the pool is likely stuck. Print pool
- * state.
+ * state and the backtrace of the last woken worker, which is the prime
+ * suspect for the stall.
*/
if (!found_running)
show_pool_no_running_worker(pool);
--
2.53.0-Meta
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH RFC 3/3] workqueue: dump the last woken worker for stalled pools
2026-06-16 16:44 ` [PATCH RFC 3/3] workqueue: dump the last woken worker " Breno Leitao
@ 2026-06-19 15:40 ` Petr Mladek
0 siblings, 0 replies; 7+ messages in thread
From: Petr Mladek @ 2026-06-19 15:40 UTC (permalink / raw)
To: Breno Leitao
Cc: Tejun Heo, Lai Jiangshan, Song Liu, linux-kernel, kernel-team
On Tue 2026-06-16 09:44:41, Breno Leitao wrote:
> To identify the task most likely responsible for a stall, add
> last_woken_worker (L: pool->lock) to worker_pool and record it in
> kick_pool() just before wake_up_process(). This captures the idle
> worker that was kicked to take over when the last running worker went to
> sleep; if the pool is now stuck with no running worker, that task is the
> prime suspect and its backtrace is dumped by show_pool_no_running_worker().
>
> Using struct worker * rather than struct task_struct * avoids any
> lifetime concern: workers are only destroyed via set_worker_dying()
> which requires pool->lock, and set_worker_dying() clears
> last_woken_worker when the dying worker matches.
> show_cpu_pool_busy_workers() holds pool->lock while calling
> sched_show_task(), so last_woken_worker is either NULL or points to a
> live worker with a valid task. More precisely, set_worker_dying() clears
> last_woken_worker before setting WORKER_DIE, so a non-NULL
> last_woken_worker means the kthread has not yet exited and worker->task
> is still alive.
>
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -226,6 +226,7 @@ struct worker_pool {
> /* L: hash of busy workers */
>
> struct worker *manager; /* L: purely informational */
> + struct worker *last_woken_worker; /* L: last worker woken by kick_pool() */
I thought more about it. The "last_woken_worker" and "manager" are
related and they might eventually duplicate the information.
If I get it correctly then kick_pool() wakes a worker when needed.
The last worker becomes a "manager" and tries to create a new
worker.
IMHO, in most situations "manager" will have the same value as
"last_woken_worker". But it is not guaranteed because "pool->lock"
is not taken all the time.
There are two questions:
1. Do we need both values?
IMHO, we do:
+ "last_woken_worker" is the last woken worker. It is supposed to
guarantee the forward progress. The backtrace is interesting
because it can never get scheduled.
+ "manager" is the last "idle" worker which is actively trying to
create a new worker. It is supposed to guarantee forward progress
too. IMHO, it usually will be the "last_woken_worker" but it is
not guaranteed as mentioned above.
2. Should we print backtrace of both?
Probably not both at the same time:
+ We should print "manager" when it is set. It is set when a new
worker has to be created. And the "manager" is responsible for
the forward progress, definitely.
+ We should print "last_woken_worker" when "manager" is not set.
It is the only clue. And it likely got stuck for some reasons.
+ IMHO, "last_woken_worker" need not be printed when "manager"
is set even when it is a different worker. The "manager" is
the really responsible worker. And "last_woken_worker" likely
just started processing work items because it somehow raced with
the manager.
Does this make sense, please?
We could also add the "manager" printing in a separate patch later.
This patch is a good step forward. Feel free to use:
Reviewed-by: Petr Mladek <pmladek@suse.com>
Best Regards,
Petr
^ permalink raw reply [flat|nested] 7+ messages in thread