From: Petr Mladek <pmladek@suse.com>
To: Breno Leitao <leitao@debian.org>
Cc: Tejun Heo <tj@kernel.org>, Lai Jiangshan <jiangshanlai@gmail.com>,
Song Liu <song@kernel.org>,
linux-kernel@vger.kernel.org, kernel-team@meta.com
Subject: Re: [PATCH RFC 3/3] workqueue: dump the last woken worker for stalled pools
Date: Fri, 19 Jun 2026 17:40:42 +0200 [thread overview]
Message-ID: <ajVi-v7LojSfeGLd@pathway.suse.cz> (raw)
In-Reply-To: <20260616-wq_dump_petr-v1-3-b57473ca6d18@debian.org>
On Tue 2026-06-16 09:44:41, Breno Leitao wrote:
> To identify the task most likely responsible for a stall, add
> last_woken_worker (L: pool->lock) to worker_pool and record it in
> kick_pool() just before wake_up_process(). This captures the idle
> worker that was kicked to take over when the last running worker went to
> sleep; if the pool is now stuck with no running worker, that task is the
> prime suspect and its backtrace is dumped by show_pool_no_running_worker().
>
> Using struct worker * rather than struct task_struct * avoids any
> lifetime concern: workers are only destroyed via set_worker_dying()
> which requires pool->lock, and set_worker_dying() clears
> last_woken_worker when the dying worker matches.
> show_cpu_pool_busy_workers() holds pool->lock while calling
> sched_show_task(), so last_woken_worker is either NULL or points to a
> live worker with a valid task. More precisely, set_worker_dying() clears
> last_woken_worker before setting WORKER_DIE, so a non-NULL
> last_woken_worker means the kthread has not yet exited and worker->task
> is still alive.
>
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -226,6 +226,7 @@ struct worker_pool {
> /* L: hash of busy workers */
>
> struct worker *manager; /* L: purely informational */
> + struct worker *last_woken_worker; /* L: last worker woken by kick_pool() */
I thought more about it. The "last_woken_worker" and "manager" are
related and they might eventually duplicate the information.
If I get it correctly then kick_pool() wakes a worker when needed.
The last worker becomes a "manager" and tries to create a new
worker.
IMHO, in most situations "manager" will have the same value as
"last_woken_worker". But it is not guaranteed because "pool->lock"
is not taken all the time.
There are two questions:
1. Do we need both values?
IMHO, we do:
+ "last_woken_worker" is the last woken worker. It is supposed to
guarantee the forward progress. The backtrace is interesting
because it can never get scheduled.
+ "manager" is the last "idle" worker which is actively trying to
create a new worker. It is supposed to guarantee forward progress
too. IMHO, it usually will be the "last_woken_worker" but it is
not guaranteed as mentioned above.
2. Should we print backtrace of both?
Probably not both at the same time:
+ We should print "manager" when it is set. It is set when a new
worker has to be created. And the "manager" is responsible for
the forward progress, definitely.
+ We should print "last_woken_worker" when "manager" is not set.
It is the only clue. And it likely got stuck for some reasons.
+ IMHO, "last_woken_worker" need not be printed when "manager"
is set even when it is a different worker. The "manager" is
the really responsible worker. And "last_woken_worker" likely
just started processing work items because it somehow raced with
the manager.
Does this make sense, please?
We could also add the "manager" printing in a separate patch later.
This patch is a good step forward. Feel free to use:
Reviewed-by: Petr Mladek <pmladek@suse.com>
Best Regards,
Petr
prev parent reply other threads:[~2026-06-19 15:40 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-16 16:44 [PATCH RFC 0/3] workqueue: improve stall diagnostics for pools with no running worker Breno Leitao
2026-06-16 16:44 ` [PATCH RFC 1/3] workqueue: only show running workers in stall diagnostics Breno Leitao
2026-06-19 12:58 ` Petr Mladek
2026-06-16 16:44 ` [PATCH RFC 2/3] workqueue: trigger a single-CPU backtrace for stalled pools Breno Leitao
2026-06-19 13:42 ` Petr Mladek
2026-06-16 16:44 ` [PATCH RFC 3/3] workqueue: dump the last woken worker " Breno Leitao
2026-06-19 15:40 ` Petr Mladek [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ajVi-v7LojSfeGLd@pathway.suse.cz \
--to=pmladek@suse.com \
--cc=jiangshanlai@gmail.com \
--cc=kernel-team@meta.com \
--cc=leitao@debian.org \
--cc=linux-kernel@vger.kernel.org \
--cc=song@kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox