* [PATCH 1/4] workqueue: Use POOL_BH instead of WQ_BH when checking pool flags
2026-02-11 12:29 [PATCH 0/4] workqueue: Detect stalled in-flight workers Breno Leitao
@ 2026-02-11 12:29 ` Breno Leitao
2026-02-11 12:29 ` [PATCH 2/4] workqueue: Rename pool->watchdog_ts to pool->last_progress_ts Breno Leitao
` (3 subsequent siblings)
4 siblings, 0 replies; 8+ messages in thread
From: Breno Leitao @ 2026-02-11 12:29 UTC (permalink / raw)
To: Tejun Heo, Lai Jiangshan, Andrew Morton
Cc: linux-kernel, Omar Sandoval, kernel-team, Breno Leitao
pr_cont_worker_id() checks pool->flags against WQ_BH, which is a
workqueue-level flag (defined in workqueue.h). Pool flags use a
separate namespace with POOL_* constants (defined in workqueue.c).
The correct constant is POOL_BH. Both WQ_BH and POOL_BH are defined
as (1 << 0) so this has no behavioral impact, but it is semantically
wrong and inconsistent with every other pool-level BH check in the
file.
Fixes: 4cb1ef64609f ("workqueue: Implement BH workqueues to eventually replace tasklets")
Signed-off-by: Breno Leitao <leitao@debian.org>
---
kernel/workqueue.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index e6c249f2fb46b..265d841e1b81c 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -6274,7 +6274,7 @@ static void pr_cont_worker_id(struct worker *worker)
{
struct worker_pool *pool = worker->pool;
- if (pool->flags & WQ_BH)
+ if (pool->flags & POOL_BH)
pr_cont("bh%s",
pool->attrs->nice == HIGHPRI_NICE_LEVEL ? "-hi" : "");
else
--
2.47.3
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH 2/4] workqueue: Rename pool->watchdog_ts to pool->last_progress_ts
2026-02-11 12:29 [PATCH 0/4] workqueue: Detect stalled in-flight workers Breno Leitao
2026-02-11 12:29 ` [PATCH 1/4] workqueue: Use POOL_BH instead of WQ_BH when checking pool flags Breno Leitao
@ 2026-02-11 12:29 ` Breno Leitao
2026-02-11 12:29 ` [PATCH 3/4] workqueue: Show in-flight work item duration in stall diagnostics Breno Leitao
` (2 subsequent siblings)
4 siblings, 0 replies; 8+ messages in thread
From: Breno Leitao @ 2026-02-11 12:29 UTC (permalink / raw)
To: Tejun Heo, Lai Jiangshan, Andrew Morton
Cc: linux-kernel, Omar Sandoval, kernel-team, Breno Leitao
The watchdog_ts name doesn't convey what the timestamp actually tracks.
This field tracks the last time a workqueue got progress.
Rename it to last_progress_ts to make it clear that it records when the
pool last made forward progress (started processing new work items).
No functional change.
Signed-off-by: Breno Leitao <leitao@debian.org>
---
kernel/workqueue.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 265d841e1b81c..b3ba739cf493a 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -190,7 +190,7 @@ struct worker_pool {
int id; /* I: pool ID */
unsigned int flags; /* L: flags */
- unsigned long watchdog_ts; /* L: watchdog timestamp */
+ unsigned long last_progress_ts; /* L: last forward progress timestamp */
bool cpu_stall; /* WD: stalled cpu bound pool */
/*
@@ -1697,7 +1697,7 @@ static void __pwq_activate_work(struct pool_workqueue *pwq,
WARN_ON_ONCE(!(*wdb & WORK_STRUCT_INACTIVE));
trace_workqueue_activate_work(work);
if (list_empty(&pwq->pool->worklist))
- pwq->pool->watchdog_ts = jiffies;
+ pwq->pool->last_progress_ts = jiffies;
move_linked_works(work, &pwq->pool->worklist, NULL);
__clear_bit(WORK_STRUCT_INACTIVE_BIT, wdb);
}
@@ -2348,7 +2348,7 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
*/
if (list_empty(&pwq->inactive_works) && pwq_tryinc_nr_active(pwq, false)) {
if (list_empty(&pool->worklist))
- pool->watchdog_ts = jiffies;
+ pool->last_progress_ts = jiffies;
trace_workqueue_activate_work(work);
insert_work(pwq, work, &pool->worklist, work_flags);
@@ -3352,7 +3352,7 @@ static void process_scheduled_works(struct worker *worker)
while ((work = list_first_entry_or_null(&worker->scheduled,
struct work_struct, entry))) {
if (first) {
- worker->pool->watchdog_ts = jiffies;
+ worker->pool->last_progress_ts = jiffies;
first = false;
}
process_one_work(worker, work);
@@ -4850,7 +4850,7 @@ static int init_worker_pool(struct worker_pool *pool)
pool->cpu = -1;
pool->node = NUMA_NO_NODE;
pool->flags |= POOL_DISASSOCIATED;
- pool->watchdog_ts = jiffies;
+ pool->last_progress_ts = jiffies;
INIT_LIST_HEAD(&pool->worklist);
INIT_LIST_HEAD(&pool->idle_list);
hash_init(pool->busy_hash);
@@ -6462,7 +6462,7 @@ static void show_one_worker_pool(struct worker_pool *pool)
/* How long the first pending work is waiting for a worker. */
if (!list_empty(&pool->worklist))
- hung = jiffies_to_msecs(jiffies - pool->watchdog_ts) / 1000;
+ hung = jiffies_to_msecs(jiffies - pool->last_progress_ts) / 1000;
/*
* Defer printing to avoid deadlocks in console drivers that
@@ -7688,7 +7688,7 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
touched = READ_ONCE(per_cpu(wq_watchdog_touched_cpu, pool->cpu));
else
touched = READ_ONCE(wq_watchdog_touched);
- pool_ts = READ_ONCE(pool->watchdog_ts);
+ pool_ts = READ_ONCE(pool->last_progress_ts);
if (time_after(pool_ts, touched))
ts = pool_ts;
--
2.47.3
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH 3/4] workqueue: Show in-flight work item duration in stall diagnostics
2026-02-11 12:29 [PATCH 0/4] workqueue: Detect stalled in-flight workers Breno Leitao
2026-02-11 12:29 ` [PATCH 1/4] workqueue: Use POOL_BH instead of WQ_BH when checking pool flags Breno Leitao
2026-02-11 12:29 ` [PATCH 2/4] workqueue: Rename pool->watchdog_ts to pool->last_progress_ts Breno Leitao
@ 2026-02-11 12:29 ` Breno Leitao
2026-02-11 12:29 ` [PATCH 4/4] workqueue: Detect stalled in-flight work items with empty worklist Breno Leitao
2026-02-11 18:56 ` [PATCH 0/4] workqueue: Detect stalled in-flight workers Tejun Heo
4 siblings, 0 replies; 8+ messages in thread
From: Breno Leitao @ 2026-02-11 12:29 UTC (permalink / raw)
To: Tejun Heo, Lai Jiangshan, Andrew Morton
Cc: linux-kernel, Omar Sandoval, kernel-team, Breno Leitao
When diagnosing workqueue stalls, knowing how long each in-flight work
item has been executing is valuable. Add a current_start timestamp
(jiffies) to struct worker, set it when a work item begins execution in
process_one_work(), and print the elapsed wall-clock time in show_pwq().
Unlike current_at (which tracks CPU runtime and resets on wakeup for
CPU-intensive detection), current_start is never reset because the
diagnostic cares about total wall-clock time including sleeps.
Before: in-flight: 165:stall_work_fn [wq_stall]
After: in-flight: 165:stall_work_fn [wq_stall] for 100s
Signed-off-by: Breno Leitao <leitao@debian.org>
---
kernel/workqueue.c | 3 +++
kernel/workqueue_internal.h | 1 +
2 files changed, 4 insertions(+)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index b3ba739cf493a..e527e763162e6 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3204,6 +3204,7 @@ __acquires(&pool->lock)
worker->current_pwq = pwq;
if (worker->task)
worker->current_at = worker->task->se.sum_exec_runtime;
+ worker->current_start = jiffies;
work_data = *work_data_bits(work);
worker->current_color = get_work_color(work_data);
@@ -6359,6 +6360,8 @@ static void show_pwq(struct pool_workqueue *pwq)
pr_cont(" %s", comma ? "," : "");
pr_cont_worker_id(worker);
pr_cont(":%ps", worker->current_func);
+ pr_cont(" for %us",
+ jiffies_to_msecs(jiffies - worker->current_start) / 1000);
list_for_each_entry(work, &worker->scheduled, entry)
pr_cont_work(false, work, &pcws);
pr_cont_work_flush(comma, (work_func_t)-1L, &pcws);
diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
index f6275944ada77..8def1ddc5a1bf 100644
--- a/kernel/workqueue_internal.h
+++ b/kernel/workqueue_internal.h
@@ -32,6 +32,7 @@ struct worker {
work_func_t current_func; /* K: function */
struct pool_workqueue *current_pwq; /* K: pwq */
u64 current_at; /* K: runtime at start or last wakeup */
+ unsigned long current_start; /* K: start time of current work item */
unsigned int current_color; /* K: color */
int sleeping; /* S: is worker sleeping? */
--
2.47.3
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH 4/4] workqueue: Detect stalled in-flight work items with empty worklist
2026-02-11 12:29 [PATCH 0/4] workqueue: Detect stalled in-flight workers Breno Leitao
` (2 preceding siblings ...)
2026-02-11 12:29 ` [PATCH 3/4] workqueue: Show in-flight work item duration in stall diagnostics Breno Leitao
@ 2026-02-11 12:29 ` Breno Leitao
2026-02-11 18:56 ` [PATCH 0/4] workqueue: Detect stalled in-flight workers Tejun Heo
4 siblings, 0 replies; 8+ messages in thread
From: Breno Leitao @ 2026-02-11 12:29 UTC (permalink / raw)
To: Tejun Heo, Lai Jiangshan, Andrew Morton
Cc: linux-kernel, Omar Sandoval, kernel-team, Breno Leitao
The workqueue watchdog skips pools with an empty worklist, assuming no
work is pending. However, a single work item that was dequeued and is
now executing on a worker will leave the worklist empty while the worker
is stuck. This means a pool with one hogged worker and no pending work
is invisible to the watchdog.
An example is something like:
static void stall_work_fn(struct work_struct *work)
{
for (;;) {
mdelay(1000);
cond_resched();
}
}
Fix this by scanning the pool's busy_hash for workers whose
current_start timestamp exceeds the watchdog threshold, independent of
worklist state. The new report_stalled_workers() function iterates all
in-flight workers in a pool and reports each one that has exceeded the
threshold, running as a separate detection path alongside the existing
pool-level last_progress_ts check.
This is an example of the report:
BUG: workqueue lockup - worker 365:stall_work_fn [wq_stall] stuck in pool cpus=9 node=0 flags=0x0 nice=0 for 33s!
Showing busy workqueues and worker pools:
...
The feature is gated behind a new CONFIG_WQ_WATCHDOG_WORKERS option
(disabled by default) under CONFIG_WQ_WATCHDOG.
Signed-off-by: Breno Leitao <leitao@debian.org>
---
kernel/workqueue.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++--
lib/Kconfig.debug | 12 ++++++++++++
2 files changed, 62 insertions(+), 2 deletions(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index e527e763162e6..719e14aa4ac56 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -7659,6 +7659,49 @@ static void wq_watchdog_reset_touched(void)
per_cpu(wq_watchdog_touched_cpu, cpu) = jiffies;
}
+#ifdef CONFIG_WQ_WATCHDOG_WORKERS
+/*
+ * Scan all in-flight workers in @pool for stalls. A worker is considered
+ * stalled if its current work item has been executing for longer than @thresh
+ * based on its current_start timestamp. This catches workers that are stuck
+ * regardless of the pool's worklist state or last_progress_ts.
+ */
+static bool report_stalled_workers(struct worker_pool *pool,
+ unsigned long now,
+ unsigned long thresh)
+{
+ struct worker *worker;
+ bool stall = false;
+ int bkt;
+
+ /*
+ * Iterate busy_hash without pool->lock. This is intentionally
+ * lockless to avoid contention in the watchdog timer path.
+ * Workers that have been stalled for thresh (typically 30s) are
+ * unlikely to be transitioning in/out of busy_hash concurrently.
+ */
+ hash_for_each(pool->busy_hash, bkt, worker, hentry) {
+ if (time_after(now, worker->current_start + thresh)) {
+ pr_emerg("BUG: workqueue lockup - worker ");
+ pr_cont_worker_id(worker);
+ pr_cont(":%ps stuck in pool",
+ worker->current_func);
+ pr_cont_pool_info(pool);
+ pr_cont(" for %us!\n",
+ jiffies_to_msecs(now - worker->current_start) / 1000);
+ stall = true;
+ }
+ }
+ return stall;
+}
+#else
+static bool report_stalled_workers(struct worker_pool *pool,
+ unsigned long now, unsigned long thresh)
+{
+ return false;
+}
+#endif /* CONFIG_WQ_WATCHDOG_WORKERS */
+
static void wq_watchdog_timer_fn(struct timer_list *unused)
{
unsigned long thresh = READ_ONCE(wq_watchdog_thresh) * HZ;
@@ -7677,8 +7720,6 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
unsigned long pool_ts, touched, ts;
pool->cpu_stall = false;
- if (list_empty(&pool->worklist))
- continue;
/*
* If a virtual machine is stopped by the host it can look to
@@ -7686,6 +7727,13 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
*/
kvm_check_and_clear_guest_paused();
+ /* Check for individual stalled workers in this pool. */
+ if (report_stalled_workers(pool, now, thresh))
+ lockup_detected = true;
+
+ if (list_empty(&pool->worklist))
+ continue;
+
/* get the latest of pool and touched timestamps */
if (pool->cpu >= 0)
touched = READ_ONCE(per_cpu(wq_watchdog_touched_cpu, pool->cpu));
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index ce25a8faf6e9e..dc4bb546b2033 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1320,6 +1320,18 @@ config BOOTPARAM_WQ_STALL_PANIC
This setting can be overridden at runtime via the
workqueue.panic_on_stall kernel parameter.
+config WQ_WATCHDOG_WORKERS
+ bool "Detect individual stalled workqueue workers"
+ depends on WQ_WATCHDOG
+ default n
+ help
+ Say Y here to enable per-worker stall detection. When enabled,
+ the workqueue watchdog scans all in-flight workers in each pool
+ and reports any whose current work item has been executing for
+ longer than the watchdog threshold. This catches stalled workers
+ even when the pool's worklist is empty or the pool has recently
+ made forward progress on other work items.
+
config WQ_CPU_INTENSIVE_REPORT
bool "Report per-cpu work items which hog CPU for too long"
depends on DEBUG_KERNEL
--
2.47.3
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH 0/4] workqueue: Detect stalled in-flight workers
2026-02-11 12:29 [PATCH 0/4] workqueue: Detect stalled in-flight workers Breno Leitao
` (3 preceding siblings ...)
2026-02-11 12:29 ` [PATCH 4/4] workqueue: Detect stalled in-flight work items with empty worklist Breno Leitao
@ 2026-02-11 18:56 ` Tejun Heo
2026-03-04 15:40 ` Breno Leitao
4 siblings, 1 reply; 8+ messages in thread
From: Tejun Heo @ 2026-02-11 18:56 UTC (permalink / raw)
To: Breno Leitao
Cc: Lai Jiangshan, Andrew Morton, linux-kernel, Omar Sandoval,
kernel-team
Hello,
On Wed, Feb 11, 2026 at 04:29:14AM -0800, Breno Leitao wrote:
> The workqueue watchdog detects pools that haven't made forward progress
> by checking whether pending work items on the worklist have been waiting
> too long. However, this approach has a blind spot: if a pool has only
> one work item and that item has already been dequeued and is executing on
> a worker, the worklist is empty and the watchdog skips the pool entirely.
> This means a single hogged worker with no other pending work is invisible
> to the stall detector.
>
> I was able to come up with the following example that shows this blind
> spot:
>
> static void stall_work_fn(struct work_struct *work)
> {
> for (;;) {
> mdelay(1000);
> cond_resched();
> }
> }
Workqueue doesn't require users to limit execution time. As long as there is
enough supply of concurrency to avoid stalling of pending work items, work
items can run as long as they want, including indefinitely. Workqueue stall
is there to indicate that there is insufficient supply of concurrency.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH 0/4] workqueue: Detect stalled in-flight workers
2026-02-11 18:56 ` [PATCH 0/4] workqueue: Detect stalled in-flight workers Tejun Heo
@ 2026-03-04 15:40 ` Breno Leitao
2026-03-04 16:40 ` Tejun Heo
0 siblings, 1 reply; 8+ messages in thread
From: Breno Leitao @ 2026-03-04 15:40 UTC (permalink / raw)
To: Tejun Heo
Cc: Lai Jiangshan, Andrew Morton, linux-kernel, Omar Sandoval,
kernel-team
Hello Tejun,
On Wed, Feb 11, 2026 at 08:56:11AM -1000, Tejun Heo wrote:
> On Wed, Feb 11, 2026 at 04:29:14AM -0800, Breno Leitao wrote:
> > The workqueue watchdog detects pools that haven't made forward progress
> > by checking whether pending work items on the worklist have been waiting
> > too long. However, this approach has a blind spot: if a pool has only
> > one work item and that item has already been dequeued and is executing on
> > a worker, the worklist is empty and the watchdog skips the pool entirely.
> > This means a single hogged worker with no other pending work is invisible
> > to the stall detector.
> >
> > I was able to come up with the following example that shows this blind
> > spot:
> >
> > static void stall_work_fn(struct work_struct *work)
> > {
> > for (;;) {
> > mdelay(1000);
> > cond_resched();
> > }
> > }
>
> Workqueue doesn't require users to limit execution time. As long as there is
> enough supply of concurrency to avoid stalling of pending work items, work
> items can run as long as they want, including indefinitely. Workqueue stall
> is there to indicate that there is insufficient supply of concurrency.
Thank you for the clarification. Let me share more context about the
actual problem I am observing so we can think through it together.
On some production hosts, I am seeing a workqueue stall where no
backtraces are printed:
BUG: workqueue lockup - pool cpus=4 node=0 flags=0x0 nice=0 stuck for 132s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x100
pwq 18: cpus=4 node=0 flags=0x0 nice=0 active=4 refcnt=5
in-flight: 178:stall_work1_fn [wq_stall]
pending: stall_work2_fn [wq_stall], free_obj_work, psi_avgs_work
workqueue mm_percpu_wq: flags=0x108
pwq 18: cpus=4 node=0 flags=0x0 nice=0 active=1 refcnt=2
pending: vmstat_update
pool 18: cpus=4 node=0 flags=0x0 nice=0 hung=132s workers=2 idle: 45
Showing backtraces of running workers in stalled
CPU-bound worker pools:
<nothing here>
We initially suspected a TOCTOU issue, and Omar put together a patch to
address that, but it did not identify anything.
After digging deeper, I believe I have found the root cause along with
a reproducer[1]:
1) kfence executes toggle_allocation_gate() as a delayed workqueue
item (kfence_timer) on the system WQ.
2) toggle_allocation_gate() enables a static key, which IPIs every
CPU to patch code:
static_branch_enable(&kfence_allocation_key);
3) toggle_allocation_gate() then sleeps in TASK_IDLE waiting for a
kfence allocation to occur:
wait_event_idle(allocation_wait,
atomic_read(&kfence_allocation_gate) > 0 || ...);
This can last indefinitely if no allocation goes through the
kfence path. The worker remains in the pool's busy_hash
(in-flight) but is no longer task_is_running().
4) The workqueue watchdog detects the stall and calls
show_cpu_pool_hog(), which only prints backtraces for workers
that are actively running on CPU:
static void show_cpu_pool_hog(struct worker_pool *pool) {
...
if (task_is_running(worker->task))
sched_show_task(worker->task);
}
5) Nothing is printed because the offending worker is in TASK_IDLE
state. The output shows "Showing backtraces of running workers in
stalled CPU-bound worker pools:" followed by nothing, effectively
hiding the actual culprit.
The fix I am considering is to remove the task_is_running() filter in
show_cpu_pool_hog() so that all in-flight workers in stalled pools have
their backtraces printed, regardless of whether they are running or
sleeping. This would make sleeping culprits like toggle_allocation_gate()
visible in the watchdog output.
When I test without the task_runinng, then I see the culprit.
Fix I am testing:
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index aeaec79bc09c4..3f5ee08f99313 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -7593,19 +7593,17 @@ static void show_cpu_pool_hog(struct worker_pool *pool)
raw_spin_lock_irqsave(&pool->lock, irq_flags);
hash_for_each(pool->busy_hash, bkt, worker, hentry) {
- if (task_is_running(worker->task)) {
- /*
- * Defer printing to avoid deadlocks in console
- * drivers that queue work while holding locks
- * also taken in their write paths.
- */
- printk_deferred_enter();
+ /*
+ * Defer printing to avoid deadlocks in console
+ * drivers that queue work while holding locks
+ * also taken in their write paths.
+ */
+ printk_deferred_enter();
- pr_info("pool %d:\n", pool->id);
- sched_show_task(worker->task);
+ pr_info("pool %d:\n", pool->id);
+ sched_show_task(worker->task);
- printk_deferred_exit();
- }
+ printk_deferred_exit();
}
raw_spin_unlock_irqrestore(&pool->lock, irq_flags);
@@ -7616,7 +7614,7 @@ static void show_cpu_pools_hogs(void)
struct worker_pool *pool;
int pi;
- pr_info("Showing backtraces of running workers in stalled CPU-bound worker pools:\n");
+ pr_info("Showing backtraces of in-flight workers in stalled CPU-bound worker pools:\n");
rcu_read_lock();
Then I see:
BUG: workqueue lockup - pool cpus=6 node=0 flags=0x0 nice=0 stuck for 34s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x100
pwq 26: cpus=6 node=0 flags=0x0 nice=0 active=3 refcnt=4
in-flight: 161:stall_work1_fn [wq_stall]
pending: stall_work2_fn [wq_stall], psi_avgs_work
workqueue mm_percpu_wq: flags=0x108
pwq 26: cpus=6 node=0 flags=0x0 nice=0 active=1 refcnt=2
pending: vmstat_update
pool 26: cpus=6 node=0 flags=0x0 nice=0 hung=34s workers=3 idle: 210 57
Showing backtraces of in-flight workers in stalled CPU-bound worker pools:
pool 26:
task:kworker/6:1 state:I stack:0 pid:161 tgid:161 ppid:2 task_flags:0x4208040 flags:0x00080000
Call Trace:
<TASK>
__schedule+0x1521/0x5360
? console_trylock+0x40/0x40
? preempt_count_add+0x92/0x1a0
? do_raw_spin_lock+0x12c/0x2f0
? is_mmconf_reserved+0x390/0x390
? schedule+0x91/0x350
? schedule+0x91/0x350
schedule+0x165/0x350
stall_work1_fn+0x17f/0x250 [wq_stall]
Link: https://github.com/leitao/debug/blob/main/workqueue_stall/wq_stall.c [1]
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH 0/4] workqueue: Detect stalled in-flight workers
2026-03-04 15:40 ` Breno Leitao
@ 2026-03-04 16:40 ` Tejun Heo
0 siblings, 0 replies; 8+ messages in thread
From: Tejun Heo @ 2026-03-04 16:40 UTC (permalink / raw)
To: Breno Leitao
Cc: Lai Jiangshan, Andrew Morton, linux-kernel, Omar Sandoval,
kernel-team
Hello,
On Wed, Mar 04, 2026 at 07:40:49AM -0800, Breno Leitao wrote:
> The fix I am considering is to remove the task_is_running() filter in
> show_cpu_pool_hog() so that all in-flight workers in stalled pools have
> their backtraces printed, regardless of whether they are running or
> sleeping. This would make sleeping culprits like toggle_allocation_gate()
> visible in the watchdog output.
Yeah, that makes sense to me.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 8+ messages in thread