The Linux Kernel Mailing List
 help / color / mirror / Atom feed
From: Petr Mladek <pmladek@suse.com>
To: Breno Leitao <leitao@debian.org>
Cc: Tejun Heo <tj@kernel.org>, Lai Jiangshan <jiangshanlai@gmail.com>,
	Song Liu <song@kernel.org>,
	linux-kernel@vger.kernel.org, kernel-team@meta.com
Subject: Re: [PATCH RFC 3/3] workqueue: dump the last woken worker for stalled pools
Date: Fri, 19 Jun 2026 17:40:42 +0200	[thread overview]
Message-ID: <ajVi-v7LojSfeGLd@pathway.suse.cz> (raw)
In-Reply-To: <20260616-wq_dump_petr-v1-3-b57473ca6d18@debian.org>

On Tue 2026-06-16 09:44:41, Breno Leitao wrote:
> To identify the task most likely responsible for a stall, add
> last_woken_worker (L: pool->lock) to worker_pool and record it in
> kick_pool() just before wake_up_process().  This captures the idle
> worker that was kicked to take over when the last running worker went to
> sleep; if the pool is now stuck with no running worker, that task is the
> prime suspect and its backtrace is dumped by show_pool_no_running_worker().
> 
> Using struct worker * rather than struct task_struct * avoids any
> lifetime concern: workers are only destroyed via set_worker_dying()
> which requires pool->lock, and set_worker_dying() clears
> last_woken_worker when the dying worker matches.
> show_cpu_pool_busy_workers() holds pool->lock while calling
> sched_show_task(), so last_woken_worker is either NULL or points to a
> live worker with a valid task.  More precisely, set_worker_dying() clears
> last_woken_worker before setting WORKER_DIE, so a non-NULL
> last_woken_worker means the kthread has not yet exited and worker->task
> is still alive.
> 
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -226,6 +226,7 @@ struct worker_pool {
>  						/* L: hash of busy workers */
>  
>  	struct worker		*manager;	/* L: purely informational */
> +	struct worker		*last_woken_worker; /* L: last worker woken by kick_pool() */

I thought more about it. The "last_woken_worker" and "manager" are
related and they might eventually duplicate the information.

If I get it correctly then kick_pool() wakes a worker when needed.
The last worker becomes a "manager" and tries to create a new
worker.

IMHO, in most situations "manager" will have the same value as
"last_woken_worker". But it is not guaranteed because "pool->lock"
is not taken all the time.

There are two questions:

1. Do we need both values?

IMHO, we do:

  + "last_woken_worker" is the last woken worker. It is supposed to
    guarantee the forward progress. The backtrace is interesting
    because it can never get scheduled.

  + "manager" is the last "idle" worker which is actively trying to
    create a new worker. It is supposed to guarantee forward progress
    too. IMHO, it usually will be the "last_woken_worker" but it is
    not guaranteed as mentioned above.

2. Should we print backtrace of both?

Probably not both at the same time:

  + We should print "manager" when it is set. It is set when a new
    worker has to be created. And the "manager" is responsible for
    the forward progress, definitely.

   + We should print "last_woken_worker" when "manager" is not set.
     It is the only clue. And it likely got stuck for some reasons.

   + IMHO, "last_woken_worker" need not be printed when "manager"
     is set even when it is a different worker. The "manager" is
     the really responsible worker. And "last_woken_worker" likely
     just started processing work items because it somehow raced with
     the manager.

Does this make sense, please?

We could also add the "manager" printing in a separate patch later.
This patch is a good step forward. Feel free to use:

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

      reply	other threads:[~2026-06-19 15:40 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-16 16:44 [PATCH RFC 0/3] workqueue: improve stall diagnostics for pools with no running worker Breno Leitao
2026-06-16 16:44 ` [PATCH RFC 1/3] workqueue: only show running workers in stall diagnostics Breno Leitao
2026-06-19 12:58   ` Petr Mladek
2026-06-16 16:44 ` [PATCH RFC 2/3] workqueue: trigger a single-CPU backtrace for stalled pools Breno Leitao
2026-06-19 13:42   ` Petr Mladek
2026-06-16 16:44 ` [PATCH RFC 3/3] workqueue: dump the last woken worker " Breno Leitao
2026-06-19 15:40   ` Petr Mladek [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ajVi-v7LojSfeGLd@pathway.suse.cz \
    --to=pmladek@suse.com \
    --cc=jiangshanlai@gmail.com \
    --cc=kernel-team@meta.com \
    --cc=leitao@debian.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=song@kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox