public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/5] workqueue: Detect stalled in-flight workers
@ 2026-03-05 16:15 Breno Leitao
  2026-03-05 16:15 ` [PATCH v2 1/5] workqueue: Use POOL_BH instead of WQ_BH when checking pool flags Breno Leitao
                   ` (6 more replies)
  0 siblings, 7 replies; 24+ messages in thread
From: Breno Leitao @ 2026-03-05 16:15 UTC (permalink / raw)
  To: Tejun Heo, Lai Jiangshan, Andrew Morton
  Cc: linux-kernel, Omar Sandoval, Song Liu, Danielle Costantino,
	kasan-dev, Petr Mladek, kernel-team, Breno Leitao

There is a blind spot exists in the work queue stall detecetor (aka
show_cpu_pool_hog()). It only prints workers whose task_is_running() is
true, so a busy worker that is sleeping (e.g. wait_event_idle())
produces an empty backtrace section even though it is the cause of the
stall.

Additionally, when the watchdog does report stalled pools, the output
doesn't show how long each in-flight work item has been running, making
it harder to identify which specific worker is stuck.

Example of the sample code:

    BUG: workqueue lockup - pool cpus=4 node=0 flags=0x0 nice=0 stuck for 132s!
    Showing busy workqueues and worker pools:
    workqueue events: flags=0x100
        pwq 18: cpus=4 node=0 flags=0x0 nice=0 active=4 refcnt=5
        in-flight: 178:stall_work1_fn [wq_stall]
        pending: stall_work2_fn [wq_stall], free_obj_work, psi_avgs_work
	...
    Showing backtraces of running workers in stalled
    CPU-bound worker pools:
        <nothing here>

I see it happening on real machines, causing some stalls that doesn't
have any backtrace. This is one of the code path:

  1) kfence executes toggle_allocation_gate() as a delayed workqueue
     item (kfence_timer) on the system WQ.

  2) toggle_allocation_gate() enables a static key, which IPIs every
     CPU to patch code:
          static_branch_enable(&kfence_allocation_key);

  3) toggle_allocation_gate() then sleeps in TASK_IDLE waiting for a
     kfence allocation to occur:
          wait_event_idle(allocation_wait,
                  atomic_read(&kfence_allocation_gate) > 0 || ...);

     This can last indefinitely if no allocation goes through the
     kfence path (or IPIing all the CPUs take longer, which is common on
     platforms that do not have NMI).

     The worker remains in the pool's busy_hash
     (in-flight) but is no longer task_is_running().

  4) The workqueue watchdog detects the stall and calls
     show_cpu_pool_hog(), which only prints backtraces for workers
     that are actively running on CPU:

          static void show_cpu_pool_hog(struct worker_pool *pool) {
                  ...
                  if (task_is_running(worker->task))
                          sched_show_task(worker->task);
          }

  5) Nothing is printed because the offending worker is in TASK_IDLE
     state. The output shows "Showing backtraces of running workers in
     stalled CPU-bound worker pools:" followed by nothing, effectively
     hiding the actual culprit.

Given I am using this detector a lot, I am also proposing additional
improvements here as well.

This series addresses these issues:

Patch 1 fixes a minor semantic inconsistency where pool flags were
checked against a workqueue-level constant (WQ_BH instead of POOL_BH).
No behavioral change since both constants have the same value.

Patch 2 renames pool->watchdog_ts to pool->last_progress_ts to better
describe what the timestamp actually tracks.

Patch 3 adds a current_start timestamp to struct worker, recording when
a work item began executing. This is printed in show_pwq() as elapsed
wall-clock time (e.g., "in-flight: 165:stall_work_fn [wq_stall] for
100s"), giving immediate visibility into how long each worker has been
busy.

Patch 4 removes the task_is_running() filter from show_cpu_pool_hog()
so that every in-flight worker in the pool's busy_hash is dumped. This
catches workers that are busy but sleeping or blocked, which were
previously invisible in the watchdog output.

With this series applied, stall output shows the backtrace for all
tasks, and for how long the work is stall. Example:

	 BUG: workqueue lockup - pool cpus=14 node=0 flags=0x0 nice=0 stuck for 42!
	 Showing busy workqueues and worker pools:
	 workqueue events: flags=0x100
	   pwq 2: cpus=0 node=0 flags=0x0 nice=0 active=1 refcnt=2
	     pending: vmstat_shepherd
	   pwq 58: cpus=14 node=0 flags=0x0 nice=0 active=4 refcnt=5
	     in-flight: 184:stall_work1_fn [wq_stall] for 39s
 	 ...
	 Showing backtraces of busy workers in stalled CPU-bound worker pools:
	 pool 58:
	 task:kworker/14:1    state:I stack:0     pid:184 tgid:184   ppid:2      task_flags:0x4208040 flags:0x00080000
	 Call Trace:
	  <TASK>
	  __schedule+0x1521/0x5360
	  schedule+0x165/0x350
	  stall_work1_fn+0x17f/0x250 [wq_stall]
	  ...

---
Changes in v2:
- Drop the task_running() filter in show_cpu_pool_hog() instead of assuming a
  work item cannot stay running forever.
- Add a sample code to exercise the stall detector
- Link to v1: https://patch.msgid.link/20260211-wqstall_start-at-v1-0-bd9499a18c19@debian.org

---
Breno Leitao (5):
      workqueue: Use POOL_BH instead of WQ_BH when checking pool flags
      workqueue: Rename pool->watchdog_ts to pool->last_progress_ts
      workqueue: Show in-flight work item duration in stall diagnostics
      workqueue: Show all busy workers in stall diagnostics
      workqueue: Add stall detector sample module

 kernel/workqueue.c                          | 47 +++++++-------
 kernel/workqueue_internal.h                 |  1 +
 samples/workqueue/stall_detector/Makefile   |  1 +
 samples/workqueue/stall_detector/wq_stall.c | 98 +++++++++++++++++++++++++++++
 4 files changed, 124 insertions(+), 23 deletions(-)
---
base-commit: c107785c7e8dbabd1c18301a1c362544b5786282
change-id: 20260210-wqstall_start-at-e7319a005ab4

Best regards,
--  
Breno Leitao <leitao@debian.org>


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2026-03-20 10:44 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-05 16:15 [PATCH v2 0/5] workqueue: Detect stalled in-flight workers Breno Leitao
2026-03-05 16:15 ` [PATCH v2 1/5] workqueue: Use POOL_BH instead of WQ_BH when checking pool flags Breno Leitao
2026-03-05 17:13   ` Song Liu
2026-03-05 16:15 ` [PATCH v2 2/5] workqueue: Rename pool->watchdog_ts to pool->last_progress_ts Breno Leitao
2026-03-05 17:16   ` Song Liu
2026-03-05 16:15 ` [PATCH v2 3/5] workqueue: Show in-flight work item duration in stall diagnostics Breno Leitao
2026-03-05 17:17   ` Song Liu
2026-03-05 16:15 ` [PATCH v2 4/5] workqueue: Show all busy workers " Breno Leitao
2026-03-05 17:17   ` Song Liu
2026-03-12 17:03   ` Petr Mladek
2026-03-13 12:57     ` Breno Leitao
2026-03-13 16:27       ` Petr Mladek
2026-03-18 11:31         ` Breno Leitao
2026-03-18 15:11           ` Petr Mladek
2026-03-20 10:41             ` Breno Leitao
2026-03-05 16:15 ` [PATCH v2 5/5] workqueue: Add stall detector sample module Breno Leitao
2026-03-05 17:25   ` Song Liu
2026-03-05 17:39 ` [PATCH v2 0/5] workqueue: Improve stall diagnostics Tejun Heo
2026-03-12 16:38 ` [PATCH v2 0/5] workqueue: Detect stalled in-flight workers Petr Mladek
2026-03-13 12:24   ` Breno Leitao
2026-03-13 14:38     ` Petr Mladek
2026-03-13 17:36       ` Breno Leitao
2026-03-18 16:46         ` Petr Mladek
2026-03-20 10:44           ` Breno Leitao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox