All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hillf Danton <hdanton@sina.com>
To: Breno Leitao <leitao@debian.org>
Cc: Petr Mladek <pmladek@suse.com>, Tejun Heo <tj@kernel.org>,
	linux-kernel@vger.kernel.org, Omar Sandoval <osandov@osandov.com>,
	Danielle Costantino <dcostantino@meta.com>,
	kasan-dev@googlegroups.com
Subject: Re: [PATCH v2 0/5] workqueue: Detect stalled in-flight workers
Date: Wed, 13 May 2026 16:57:24 +0800	[thread overview]
Message-ID: <20260513085725.597-1-hdanton@sina.com> (raw)
In-Reply-To: <abP8wDhYWwk3ufmA@gmail.com>

On Fri, 13 Mar 2026 05:24:54 -0700 Breno Leitao wrote:
> On Thu, Mar 12, 2026 at 05:38:26PM +0100, Petr Mladek wrote:
> > On Thu 2026-03-05 08:15:36, Breno Leitao wrote:
> > > There is a blind spot exists in the work queue stall detecetor (aka
> > > show_cpu_pool_hog()). It only prints workers whose task_is_running() is
> > > true, so a busy worker that is sleeping (e.g. wait_event_idle())
> > > produces an empty backtrace section even though it is the cause of the
> > > stall.
> > > 
> > > Additionally, when the watchdog does report stalled pools, the output
> > > doesn't show how long each in-flight work item has been running, making
> > > it harder to identify which specific worker is stuck.
> > > 
> > > Example of the sample code:
> > > 
> > >     BUG: workqueue lockup - pool cpus=4 node=0 flags=0x0 nice=0 stuck for 132s!
> > >     Showing busy workqueues and worker pools:
> > >     workqueue events: flags=0x100
> > >         pwq 18: cpus=4 node=0 flags=0x0 nice=0 active=4 refcnt=5
> > >         in-flight: 178:stall_work1_fn [wq_stall]
> > >         pending: stall_work2_fn [wq_stall], free_obj_work, psi_avgs_work
> > > 	...
> > >     Showing backtraces of running workers in stalled
> > >     CPU-bound worker pools:
> > >         <nothing here>
> > > 
> > > I see it happening on real machines, causing some stalls that doesn't
> > > have any backtrace. This is one of the code path:
> > > 
> > >   1) kfence executes toggle_allocation_gate() as a delayed workqueue
> > >      item (kfence_timer) on the system WQ.
> > > 
> > >   2) toggle_allocation_gate() enables a static key, which IPIs every
> > >      CPU to patch code:
> > >           static_branch_enable(&kfence_allocation_key);
> > > 
> > >   3) toggle_allocation_gate() then sleeps in TASK_IDLE waiting for a
> > >      kfence allocation to occur:
> > >           wait_event_idle(allocation_wait,
> > >                   atomic_read(&kfence_allocation_gate) > 0 || ...);
> > > 
> > >      This can last indefinitely if no allocation goes through the
> > >      kfence path (or IPIing all the CPUs take longer, which is common on
> > >      platforms that do not have NMI).
> > > 
> > >      The worker remains in the pool's busy_hash
> > >      (in-flight) but is no longer task_is_running().
> > >
> > >   4) The workqueue watchdog detects the stall and calls
> > >      show_cpu_pool_hog(), which only prints backtraces for workers
> > >      that are actively running on CPU:
> > > 
> > >           static void show_cpu_pool_hog(struct worker_pool *pool) {
> > >                   ...
> > >                   if (task_is_running(worker->task))
> > >                           sched_show_task(worker->task);
> > >           }
> > > 
> > >   5) Nothing is printed because the offending worker is in TASK_IDLE
> > >      state. The output shows "Showing backtraces of running workers in
> > >      stalled CPU-bound worker pools:" followed by nothing, effectively
> > >      hiding the actual culprit.
> > 
> > I am trying to better understand the situation. There was a reason
> > why only the worker in the running state was shown.
> > 
> > Normally, a sleeping worker should not cause a stall. The scheduler calls
> > wq_worker_sleeping() which should wake up another idle worker. There is
> > always at least one idle worker in the poll. It should start processing
> > the next pending work. Or it should fork another worker when it was
> > the last idle one.
> 
> Right, but let's look at this case:
> 
> 	 BUG: workqueue lockup - pool 55 cpu 13 curr 0 (swapper/13) stack ffff800085640000 cpus=13 node=0 flags=0x0 nice=-20 stuck for 679s!
> 	  work func=blk_mq_timeout_work data=0xffff0000ad7e3a05
> 	  Showing busy workqueues and worker pools:
> 	  workqueue events_unbound: flags=0x2
> 	    pwq 288: cpus=0-71 flags=0x4 nice=0 active=1 refcnt=2
> 	      in-flight: 4083734:btrfs_extent_map_shrinker_worker
> 	  workqueue mm_percpu_wq: flags=0x8
> 	    pwq 14: cpus=3 node=0 flags=0x0 nice=0 active=1 refcnt=2
> 	      pending: vmstat_update
> 	  pool 288: cpus=0-71 flags=0x4 nice=0 hung=0s workers=17 idle: 3800629 3959700 3554824 3706405 3759881 4065549 4041361 4065548 1715676 4086805 3860852 3587585 4065550 4014041 3944711 3744484
> 	  Showing backtraces of running workers in stalled CPU-bound worker pools:
> 		# Nothing in here
> 
> It seems CPU 13 is idle (curr = 0) and blk_mq_timeout_work has been pending for
> 679s ?
>
An idle CPU failed to process pending work, so the root cause lies outside
workqueue, and it is difficult to understand why giving more X-ray scan
to Peter helps if Paul has a bone in throat.

      parent reply	other threads:[~2026-05-13  8:58 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20260305-wqstall_start-at-v2-0-b60863ee0899@debian.org>
     [not found] ` <20260305-wqstall_start-at-v2-4-b60863ee0899@debian.org>
2026-05-07 10:20   ` [PATCH v2 4/5] workqueue: Show all busy workers in stall diagnostics Jiri Slaby
2026-05-07 13:11     ` Breno Leitao
2026-05-11  5:21       ` Jiri Slaby
2026-05-13  7:29         ` Thorsten Leemhuis
2026-05-13  8:03           ` Jiri Slaby
2026-05-13  8:53 ` [PATCH v2 0/5] workqueue: Detect stalled in-flight workers Markus Elfring
     [not found] ` <abLsAi7_fU5FrYiF@pathway.suse.cz>
     [not found]   ` <abP8wDhYWwk3ufmA@gmail.com>
2026-05-13  8:57     ` Hillf Danton [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260513085725.597-1-hdanton@sina.com \
    --to=hdanton@sina.com \
    --cc=dcostantino@meta.com \
    --cc=kasan-dev@googlegroups.com \
    --cc=leitao@debian.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=osandov@osandov.com \
    --cc=pmladek@suse.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.