All of lore.kernel.org
 help / color / mirror / Atom feed
From: Srikar Dronamraju <srikar@linux.ibm.com>
To: Tejun Heo <tj@kernel.org>
Cc: Boqun Feng <boqun@kernel.org>, Vasily Gorbik <gor@linux.ibm.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Frederic Weisbecker <frederic@kernel.org>,
	Neeraj Upadhyay <neeraj.upadhyay@kernel.org>,
	Joel Fernandes <joelagnelf@nvidia.com>,
	Uladzislau Rezki <urezki@gmail.com>,
	rcu@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-s390@vger.kernel.org,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	samir@linux.ibm.com
Subject: Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
Date: Wed, 29 Apr 2026 20:30:38 +0530	[thread overview]
Message-ID: <afIdFgDD9w2U6hZy@linux.ibm.com> (raw)
In-Reply-To: <adlHKowvhn8AGXCc@slm.duckdns.org>

* Tejun Heo <tj@kernel.org> [2026-04-10 08:53:30]:

Hi Tejun,

[ copying Samir Mulani to this thread ]

> Hello,
> 
> > Seems that we (mostly Paul) have our own trick to track whether a CPU
> > has ever been onlined in RCU, see rcu_cpu_beenfullyonline(). Paul also
> > used it in his fix [1]. And I think it won't be that hard to copy it
> > into workqueue and let queue_work_on() use it so that if the user queues
> > a work on a never-onlined CPU, it can detect it (with a warning?) and do
> > something?
> 
> The easiest way to do this is just creating the initial workers for all
> possible pools. Please see below. However, the downside is that it's going
> to create all workers for all possible cpus. This isn't a problem for
> anybody else but these IBM mainframes often come up with a lot of possible
> but not-yet-or-ever-online CPUs for capacity management, so the cost may not
> be negligible on some configurations.
> 
> IBM folks, is that okay?

Even on PowerPC LPARS, its not uncommon to have possible cpus != online cpus
at boot.  However your approach will work.

And Samir has already tested the same too and reported here
https://lkml.kernel.org/r/1b89c25b-7c1d-4ed8-adf3-ac504b6f086a@linux.ibm.com

> 
> Also, why do you need to queue work items on an offline CPU? Do they
> actually have to be per-cpu? Can you get away with using an unbound
> workqueue?
> 
> Thanks.
> 
> From: Tejun Heo <tj@kernel.org>
> Subject: workqueue: Create workers for all possible CPUs on init
> 
> Per-CPU worker pools are initialized for every possible CPU during early boot,
> but workqueue_init() only creates initial workers for online CPUs. On systems
> where possible CPUs outnumber online CPUs (e.g. s390 LPARs with 76 online and
> 400 possible CPUs), the pools for never-onlined CPUs have POOL_DISASSOCIATED
> set but no workers. Any work item queued on such a CPU hangs indefinitely.
> 
> This was exposed by 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
> non-preemptible") which made SRCU schedule callbacks on all possible CPUs
> during size transitions, triggering workqueue lockup warnings for all
> never-onlined CPUs.
> 
> Create workers for all possible CPUs during init, not just online ones. For
> online CPUs, the behavior is unchanged - POOL_DISASSOCIATED is cleared and the
> worker is bound to the CPU. For not-yet-online CPUs, POOL_DISASSOCIATED
> remains set, so worker_attach_to_pool() marks the worker UNBOUND and it can
> execute on any CPU. When the CPU later comes online, rebind_workers() handles
> the transition to associated operation as usual.
> 

With these patch, if a CPU has been onlined once, it's should be ok to queue
the work on that CPU even if its offline now.

> Reported-by: Vasily Gorbik <gor@linux.ibm.com>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Boqun Feng <boqun@kernel.org>
> Cc: Paul E. McKenney <paulmck@kernel.org>

Reviewed-by: Srikar Dronamraju <srikar@linux.ibm.com>

> ---
>  kernel/workqueue.c |    5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -8068,9 +8068,10 @@ void __init workqueue_init(void)
>  		for_each_bh_worker_pool(pool, cpu)
>  			BUG_ON(!create_worker(pool));
> 
> -	for_each_online_cpu(cpu) {
> +	for_each_possible_cpu(cpu) {
>  		for_each_cpu_worker_pool(pool, cpu) {
> -			pool->flags &= ~POOL_DISASSOCIATED;
> +			if (cpu_online(cpu))
> +				pool->flags &= ~POOL_DISASSOCIATED;
>  			BUG_ON(!create_worker(pool));
>  		}
>  	}
> -- 
> tejun

-- 
Thanks and Regards
Srikar Dronamraju

  parent reply	other threads:[~2026-04-29 15:00 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-09 13:08 BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition Vasily Gorbik
2026-04-09 17:22 ` Paul E. McKenney
2026-04-09 19:15   ` Vasily Gorbik
2026-04-09 20:10     ` Paul E. McKenney
2026-04-10  4:03       ` Paul E. McKenney
2026-04-14 19:24         ` Paul E. McKenney
2026-04-29 17:50           ` Vasily Gorbik
2026-04-29 18:05             ` Paul E. McKenney
2026-04-29 18:23               ` Vasily Gorbik
2026-04-09 17:26 ` Boqun Feng
2026-04-09 17:40   ` Boqun Feng
2026-04-09 17:47     ` Tejun Heo
2026-04-09 17:48       ` Tejun Heo
2026-04-09 18:04         ` Paul E. McKenney
2026-04-09 18:09           ` Tejun Heo
2026-04-09 18:15             ` Paul E. McKenney
2026-04-09 18:10       ` Boqun Feng
2026-04-09 18:27         ` Paul E. McKenney
2026-04-10 18:53         ` Tejun Heo
2026-04-10 19:17           ` Paul E. McKenney
2026-04-10 19:29             ` Tejun Heo
2026-04-29 15:00           ` Srikar Dronamraju [this message]
2026-04-29 17:08             ` Vasily Gorbik
2026-04-29 17:18               ` Paul E. McKenney
2026-04-29 17:44                 ` Shrikanth Hegde
2026-04-29 18:01                   ` Paul E. McKenney
2026-04-30  7:08                     ` Shrikanth Hegde
2026-04-30 16:05                       ` Paul E. McKenney
2026-04-30 16:10                       ` Paul E. McKenney
2026-05-01 13:17                         ` Shrikanth Hegde
2026-05-01 14:00                           ` Paul E. McKenney
2026-04-29 18:17           ` Samir M

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=afIdFgDD9w2U6hZy@linux.ibm.com \
    --to=srikar@linux.ibm.com \
    --cc=boqun@kernel.org \
    --cc=frederic@kernel.org \
    --cc=gor@linux.ibm.com \
    --cc=jiangshanlai@gmail.com \
    --cc=joelagnelf@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=neeraj.upadhyay@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=rcu@vger.kernel.org \
    --cc=samir@linux.ibm.com \
    --cc=tj@kernel.org \
    --cc=urezki@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.