All of lore.kernel.org
 help / color / mirror / Atom feed
From: Petr Mladek <pmladek@suse.com>
To: Breno Leitao <leitao@debian.org>
Cc: Tejun Heo <tj@kernel.org>, Lai Jiangshan <jiangshanlai@gmail.com>,
	Song Liu <song@kernel.org>,
	linux-kernel@vger.kernel.org, kernel-team@meta.com
Subject: Re: [PATCH RFC 3/3] workqueue: dump the last woken worker for stalled pools
Date: Fri, 19 Jun 2026 17:40:42 +0200	[thread overview]
Message-ID: <ajVi-v7LojSfeGLd@pathway.suse.cz> (raw)
In-Reply-To: <20260616-wq_dump_petr-v1-3-b57473ca6d18@debian.org>

On Tue 2026-06-16 09:44:41, Breno Leitao wrote:
> To identify the task most likely responsible for a stall, add
> last_woken_worker (L: pool->lock) to worker_pool and record it in
> kick_pool() just before wake_up_process().  This captures the idle
> worker that was kicked to take over when the last running worker went to
> sleep; if the pool is now stuck with no running worker, that task is the
> prime suspect and its backtrace is dumped by show_pool_no_running_worker().
> 
> Using struct worker * rather than struct task_struct * avoids any
> lifetime concern: workers are only destroyed via set_worker_dying()
> which requires pool->lock, and set_worker_dying() clears
> last_woken_worker when the dying worker matches.
> show_cpu_pool_busy_workers() holds pool->lock while calling
> sched_show_task(), so last_woken_worker is either NULL or points to a
> live worker with a valid task.  More precisely, set_worker_dying() clears
> last_woken_worker before setting WORKER_DIE, so a non-NULL
> last_woken_worker means the kthread has not yet exited and worker->task
> is still alive.
> 
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -226,6 +226,7 @@ struct worker_pool {
>  						/* L: hash of busy workers */
>  
>  	struct worker		*manager;	/* L: purely informational */
> +	struct worker		*last_woken_worker; /* L: last worker woken by kick_pool() */

I thought more about it. The "last_woken_worker" and "manager" are
related and they might eventually duplicate the information.

If I get it correctly then kick_pool() wakes a worker when needed.
The last worker becomes a "manager" and tries to create a new
worker.

IMHO, in most situations "manager" will have the same value as
"last_woken_worker". But it is not guaranteed because "pool->lock"
is not taken all the time.

There are two questions:

1. Do we need both values?

IMHO, we do:

  + "last_woken_worker" is the last woken worker. It is supposed to
    guarantee the forward progress. The backtrace is interesting
    because it can never get scheduled.

  + "manager" is the last "idle" worker which is actively trying to
    create a new worker. It is supposed to guarantee forward progress
    too. IMHO, it usually will be the "last_woken_worker" but it is
    not guaranteed as mentioned above.

2. Should we print backtrace of both?

Probably not both at the same time:

  + We should print "manager" when it is set. It is set when a new
    worker has to be created. And the "manager" is responsible for
    the forward progress, definitely.

   + We should print "last_woken_worker" when "manager" is not set.
     It is the only clue. And it likely got stuck for some reasons.

   + IMHO, "last_woken_worker" need not be printed when "manager"
     is set even when it is a different worker. The "manager" is
     the really responsible worker. And "last_woken_worker" likely
     just started processing work items because it somehow raced with
     the manager.

Does this make sense, please?

We could also add the "manager" printing in a separate patch later.
This patch is a good step forward. Feel free to use:

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

      reply	other threads:[~2026-06-19 15:40 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-16 16:44 [PATCH RFC 0/3] workqueue: improve stall diagnostics for pools with no running worker Breno Leitao
2026-06-16 16:44 ` [PATCH RFC 1/3] workqueue: only show running workers in stall diagnostics Breno Leitao
2026-06-19 12:58   ` Petr Mladek
2026-06-16 16:44 ` [PATCH RFC 2/3] workqueue: trigger a single-CPU backtrace for stalled pools Breno Leitao
2026-06-19 13:42   ` Petr Mladek
2026-06-16 16:44 ` [PATCH RFC 3/3] workqueue: dump the last woken worker " Breno Leitao
2026-06-19 15:40   ` Petr Mladek [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ajVi-v7LojSfeGLd@pathway.suse.cz \
    --to=pmladek@suse.com \
    --cc=jiangshanlai@gmail.com \
    --cc=kernel-team@meta.com \
    --cc=leitao@debian.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=song@kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.