From: Breno Leitao <leitao@debian.org>
To: Petr Mladek <pmladek@suse.com>
Cc: Tejun Heo <tj@kernel.org>, Lai Jiangshan <jiangshanlai@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org,
Omar Sandoval <osandov@osandov.com>, Song Liu <song@kernel.org>,
Danielle Costantino <dcostantino@meta.com>,
kasan-dev@googlegroups.com, kernel-team@meta.com
Subject: Re: [PATCH v2 0/5] workqueue: Detect stalled in-flight workers
Date: Fri, 13 Mar 2026 05:24:54 -0700 [thread overview]
Message-ID: <abP8wDhYWwk3ufmA@gmail.com> (raw)
In-Reply-To: <abLsAi7_fU5FrYiF@pathway.suse.cz>
Hello Petr,
On Thu, Mar 12, 2026 at 05:38:26PM +0100, Petr Mladek wrote:
> On Thu 2026-03-05 08:15:36, Breno Leitao wrote:
> > There is a blind spot exists in the work queue stall detecetor (aka
> > show_cpu_pool_hog()). It only prints workers whose task_is_running() is
> > true, so a busy worker that is sleeping (e.g. wait_event_idle())
> > produces an empty backtrace section even though it is the cause of the
> > stall.
> >
> > Additionally, when the watchdog does report stalled pools, the output
> > doesn't show how long each in-flight work item has been running, making
> > it harder to identify which specific worker is stuck.
> >
> > Example of the sample code:
> >
> > BUG: workqueue lockup - pool cpus=4 node=0 flags=0x0 nice=0 stuck for 132s!
> > Showing busy workqueues and worker pools:
> > workqueue events: flags=0x100
> > pwq 18: cpus=4 node=0 flags=0x0 nice=0 active=4 refcnt=5
> > in-flight: 178:stall_work1_fn [wq_stall]
> > pending: stall_work2_fn [wq_stall], free_obj_work, psi_avgs_work
> > ...
> > Showing backtraces of running workers in stalled
> > CPU-bound worker pools:
> > <nothing here>
> >
> > I see it happening on real machines, causing some stalls that doesn't
> > have any backtrace. This is one of the code path:
> >
> > 1) kfence executes toggle_allocation_gate() as a delayed workqueue
> > item (kfence_timer) on the system WQ.
> >
> > 2) toggle_allocation_gate() enables a static key, which IPIs every
> > CPU to patch code:
> > static_branch_enable(&kfence_allocation_key);
> >
> > 3) toggle_allocation_gate() then sleeps in TASK_IDLE waiting for a
> > kfence allocation to occur:
> > wait_event_idle(allocation_wait,
> > atomic_read(&kfence_allocation_gate) > 0 || ...);
> >
> > This can last indefinitely if no allocation goes through the
> > kfence path (or IPIing all the CPUs take longer, which is common on
> > platforms that do not have NMI).
> >
> > The worker remains in the pool's busy_hash
> > (in-flight) but is no longer task_is_running().
> >
> > 4) The workqueue watchdog detects the stall and calls
> > show_cpu_pool_hog(), which only prints backtraces for workers
> > that are actively running on CPU:
> >
> > static void show_cpu_pool_hog(struct worker_pool *pool) {
> > ...
> > if (task_is_running(worker->task))
> > sched_show_task(worker->task);
> > }
> >
> > 5) Nothing is printed because the offending worker is in TASK_IDLE
> > state. The output shows "Showing backtraces of running workers in
> > stalled CPU-bound worker pools:" followed by nothing, effectively
> > hiding the actual culprit.
>
> I am trying to better understand the situation. There was a reason
> why only the worker in the running state was shown.
>
> Normally, a sleeping worker should not cause a stall. The scheduler calls
> wq_worker_sleeping() which should wake up another idle worker. There is
> always at least one idle worker in the poll. It should start processing
> the next pending work. Or it should fork another worker when it was
> the last idle one.
Right, but let's look at this case:
BUG: workqueue lockup - pool 55 cpu 13 curr 0 (swapper/13) stack ffff800085640000 cpus=13 node=0 flags=0x0 nice=-20 stuck for 679s!
work func=blk_mq_timeout_work data=0xffff0000ad7e3a05
Showing busy workqueues and worker pools:
workqueue events_unbound: flags=0x2
pwq 288: cpus=0-71 flags=0x4 nice=0 active=1 refcnt=2
in-flight: 4083734:btrfs_extent_map_shrinker_worker
workqueue mm_percpu_wq: flags=0x8
pwq 14: cpus=3 node=0 flags=0x0 nice=0 active=1 refcnt=2
pending: vmstat_update
pool 288: cpus=0-71 flags=0x4 nice=0 hung=0s workers=17 idle: 3800629 3959700 3554824 3706405 3759881 4065549 4041361 4065548 1715676 4086805 3860852 3587585 4065550 4014041 3944711 3744484
Showing backtraces of running workers in stalled CPU-bound worker pools:
# Nothing in here
It seems CPU 13 is idle (curr = 0) and blk_mq_timeout_work has been pending for
679s ?
> I wonder what blocked the idle worker from waking or forking
> a new worker. Was it caused by the IPIs?
Not sure, keep in mind that these hosts (arm64) do not have NMI, so,
IPIs are just regular interrupts that could take a long time to be handled. The
toggle_allocation_gate() was good example, given it was sending IPIs very
frequently and I took it as an example for the cover letter, but, this problem
also show up with diferent places. (more examples later)
> Did printing the sleeping workers helped to analyze the problem?
That is my hope. I don't have a reproducer other than the one in this
patchset.
I am currently rolling this patchset to production, and I can report once
I get more information.
> I wonder if we could do better in this case. For example, warn
> that the scheduler failed to wake up another idle worker when
> no worker is in the running state. And maybe, print backtrace
> of the currently running process on the given CPU because it
> likely blocks waking/scheduling the idle worker.
I am happy to improve this, given this has been a hard issue. let me give more
instances of the "empty" stalls I am seeing. All with empty backtraces:
# Instance 1
BUG: workqueue lockup - pool cpus=33 node=0 flags=0x0 nice=0 stuck for 33s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
pwq 134: cpus=33 node=0 flags=0x0 nice=0 active=3 refcnt=4
pending: 3*psi_avgs_work
pwq 218: cpus=54 node=0 flags=0x0 nice=0 active=1 refcnt=2
in-flight: 842:key_garbage_collector
workqueue mm_percpu_wq: flags=0x8
pwq 134: cpus=33 node=0 flags=0x0 nice=0 active=1 refcnt=2
pending: vmstat_update
pool 218: cpus=54 node=0 flags=0x0 nice=0 hung=0s workers=3 idle: 11200 524627
Showing backtraces of running workers in stalled CPU-bound worker pools:
# Instance 2
BUG: workqueue lockup - pool cpus=53 node=0 flags=0x0 nice=0 stuck for 459s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
pwq 2: cpus=0 node=0 flags=0x0 nice=0 active=1 refcnt=2
pending: psi_avgs_work
pwq 214: cpus=53 node=0 flags=0x0 nice=0 active=4 refcnt=5
pending: 2*psi_avgs_work, drain_local_memcg_stock, iova_depot_work_func
workqueue events_freezable: flags=0x4
pwq 2: cpus=0 node=0 flags=0x0 nice=0 active=1 refcnt=2
pending: pci_pme_list_scan
workqueue slub_flushwq: flags=0x8
pwq 214: cpus=53 node=0 flags=0x0 nice=0 active=1 refcnt=3
pending: flush_cpu_slab BAR(7520)
workqueue mm_percpu_wq: flags=0x8
pwq 214: cpus=53 node=0 flags=0x0 nice=0 active=1 refcnt=2
pending: vmstat_update
workqueue mlx5_cmd_0002:03:00.1: flags=0x6000a
pwq 576: cpus=0-143 flags=0x4 nice=0 active=1 refcnt=146
pending: cmd_work_handler
Showing backtraces of running workers in stalled CPU-bound worker pools:
# Instance 3
BUG: workqueue lockup - pool cpus=74 node=1 flags=0x0 nice=0 stuck for 31s!
Showing busy workqueues and worker pools:
workqueue mm_percpu_wq: flags=0x8
pwq 298: cpus=74 node=1 flags=0x0 nice=0 active=1 refcnt=2
pending: vmstat_update
Showing backtraces of running workers in stalled CPU-bound worker pools:
# Instance 4
BUG: workqueue lockup - pool cpus=71 node=0 flags=0x0 nice=0 stuck for 32s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
pwq 286: cpus=71 node=0 flags=0x0 nice=0 active=2 refcnt=3
pending: psi_avgs_work, fuse_check_timeout
workqueue events_freezable: flags=0x4
pwq 2: cpus=0 node=0 flags=0x0 nice=0 active=1 refcnt=2
pending: pci_pme_list_scan
workqueue mm_percpu_wq: flags=0x8
pwq 286: cpus=71 node=0 flags=0x0 nice=0 active=1 refcnt=2
pending: vmstat_update
Showing backtraces of running workers in stalled CPU-bound worker pools:
Thanks for your help,
--breno
next prev parent reply other threads:[~2026-03-13 12:25 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-05 16:15 [PATCH v2 0/5] workqueue: Detect stalled in-flight workers Breno Leitao
2026-03-05 16:15 ` [PATCH v2 1/5] workqueue: Use POOL_BH instead of WQ_BH when checking pool flags Breno Leitao
2026-03-05 17:13 ` Song Liu
2026-03-05 16:15 ` [PATCH v2 2/5] workqueue: Rename pool->watchdog_ts to pool->last_progress_ts Breno Leitao
2026-03-05 17:16 ` Song Liu
2026-03-05 16:15 ` [PATCH v2 3/5] workqueue: Show in-flight work item duration in stall diagnostics Breno Leitao
2026-03-05 17:17 ` Song Liu
2026-03-05 16:15 ` [PATCH v2 4/5] workqueue: Show all busy workers " Breno Leitao
2026-03-05 17:17 ` Song Liu
2026-03-12 17:03 ` Petr Mladek
2026-03-13 12:57 ` Breno Leitao
2026-03-13 16:27 ` Petr Mladek
2026-03-18 11:31 ` Breno Leitao
2026-03-18 15:11 ` Petr Mladek
2026-03-20 10:41 ` Breno Leitao
2026-03-05 16:15 ` [PATCH v2 5/5] workqueue: Add stall detector sample module Breno Leitao
2026-03-05 17:25 ` Song Liu
2026-03-05 17:39 ` [PATCH v2 0/5] workqueue: Improve stall diagnostics Tejun Heo
2026-03-12 16:38 ` [PATCH v2 0/5] workqueue: Detect stalled in-flight workers Petr Mladek
2026-03-13 12:24 ` Breno Leitao [this message]
2026-03-13 14:38 ` Petr Mladek
2026-03-13 17:36 ` Breno Leitao
2026-03-18 16:46 ` Petr Mladek
2026-03-20 10:44 ` Breno Leitao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=abP8wDhYWwk3ufmA@gmail.com \
--to=leitao@debian.org \
--cc=akpm@linux-foundation.org \
--cc=dcostantino@meta.com \
--cc=jiangshanlai@gmail.com \
--cc=kasan-dev@googlegroups.com \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=osandov@osandov.com \
--cc=pmladek@suse.com \
--cc=song@kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox