BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
@ 2026-04-09 13:08 Vasily Gorbik
  2026-04-09 17:22 ` Paul E. McKenney
  2026-04-09 17:26 ` Boqun Feng
  0 siblings, 2 replies; 32+ messages in thread
From: Vasily Gorbik @ 2026-04-09 13:08 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Boqun Feng, Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Uladzislau Rezki, rcu, linux-kernel, linux-s390

Commit 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
non-preemptible") defers srcu_node tree allocation when called under
raw spinlock, putting SRCU through ~6 transitional grace periods
(SRCU_SIZE_ALLOC to SRCU_SIZE_BIG). During this transition srcu_gp_end()
uses mask = ~0, which makes srcu_schedule_cbs_snp() call queue_work_on()
for every possible CPU. Since rcu_gp_wq is WQ_PERCPU, work targets
per-CPU pools directly - pools for not-online CPUs have no workers,
work accumulates, workqueue lockup detector fires.

Before 61bbcfb50514, GFP_ATOMIC allocation went straight to
SRCU_SIZE_BIG, the mask = ~0 path was never reached.

Affects systems with convert_to_big active (auto when nr_cpu_ids >= 128)
and possible CPUs > online CPUs. Hit on s390 LPAR (76 online, 400 possible),
where possible CPUs > online CPUs is the usual case.
Also reproducible on x86 KVM --smp 16,maxcpus=255 (CONFIG_NR_CPUS=256)
or simply -smp 1,maxcpus=2 with srcutree.convert_to_big=1
or --smp 16,maxcpus=64 with srcutree.big_cpu_lim=32 (CONFIG_NR_CPUS=64)

s390 log (76 online CPUs, 400 possible, all pools 76-399 stuck):

  BUG: workqueue lockup - pool cpus=76 node=0 flags=0x4 nice=0 stuck for 1842s!
  BUG: workqueue lockup - pool cpus=77 node=0 flags=0x4 nice=0 stuck for 1842s!
  ...
  BUG: workqueue lockup - pool cpus=399 node=0 flags=0x4 nice=0 stuck for 1842s!
  Showing busy workqueues and worker pools:
  workqueue rcu_gp: flags=0x108
    pwq 306: cpus=76 node=0 flags=0x4 nice=0 active=3 refcnt=4
      pending: 3*srcu_invoke_callbacks
    pwq 310: cpus=77 node=0 flags=0x4 nice=0 active=3 refcnt=4
      pending: 3*srcu_invoke_callbacks
    ...
    pwq 1598: cpus=399 node=0 flags=0x4 nice=0 active=3 refcnt=4
      pending: 3*srcu_invoke_callbacks

Not sure if replacing mask = ~0 with something derived from
cpu_online_mask would be racy in that context.

[1] https://lore.kernel.org/rcu/acRho9L4zA2MRuxc@tardis.local
[2] https://lore.kernel.org/rcu/fe28d664-3872-40f6-83c6-818627ad5b7d@paulmck-laptop

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
  2026-04-09 13:08 BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition Vasily Gorbik
@ 2026-04-09 17:22 ` Paul E. McKenney
  2026-04-09 19:15   ` Vasily Gorbik
  2026-04-09 17:26 ` Boqun Feng
  1 sibling, 1 reply; 32+ messages in thread
From: Paul E. McKenney @ 2026-04-09 17:22 UTC (permalink / raw)
  To: Vasily Gorbik
  Cc: Boqun Feng, Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Uladzislau Rezki, rcu, linux-kernel, linux-s390

On Thu, Apr 09, 2026 at 03:08:45PM +0200, Vasily Gorbik wrote:
> Commit 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
> non-preemptible") defers srcu_node tree allocation when called under
> raw spinlock, putting SRCU through ~6 transitional grace periods
> (SRCU_SIZE_ALLOC to SRCU_SIZE_BIG). During this transition srcu_gp_end()
> uses mask = ~0, which makes srcu_schedule_cbs_snp() call queue_work_on()
> for every possible CPU. Since rcu_gp_wq is WQ_PERCPU, work targets
> per-CPU pools directly - pools for not-online CPUs have no workers,
> work accumulates, workqueue lockup detector fires.
> 
> Before 61bbcfb50514, GFP_ATOMIC allocation went straight to
> SRCU_SIZE_BIG, the mask = ~0 path was never reached.
> 
> Affects systems with convert_to_big active (auto when nr_cpu_ids >= 128)
> and possible CPUs > online CPUs. Hit on s390 LPAR (76 online, 400 possible),
> where possible CPUs > online CPUs is the usual case.
> Also reproducible on x86 KVM --smp 16,maxcpus=255 (CONFIG_NR_CPUS=256)
> or simply -smp 1,maxcpus=2 with srcutree.convert_to_big=1
> or --smp 16,maxcpus=64 with srcutree.big_cpu_lim=32 (CONFIG_NR_CPUS=64)
> 
> s390 log (76 online CPUs, 400 possible, all pools 76-399 stuck):
> 
>   BUG: workqueue lockup - pool cpus=76 node=0 flags=0x4 nice=0 stuck for 1842s!
>   BUG: workqueue lockup - pool cpus=77 node=0 flags=0x4 nice=0 stuck for 1842s!
>   ...
>   BUG: workqueue lockup - pool cpus=399 node=0 flags=0x4 nice=0 stuck for 1842s!
>   Showing busy workqueues and worker pools:
>   workqueue rcu_gp: flags=0x108
>     pwq 306: cpus=76 node=0 flags=0x4 nice=0 active=3 refcnt=4
>       pending: 3*srcu_invoke_callbacks
>     pwq 310: cpus=77 node=0 flags=0x4 nice=0 active=3 refcnt=4
>       pending: 3*srcu_invoke_callbacks
>     ...
>     pwq 1598: cpus=399 node=0 flags=0x4 nice=0 active=3 refcnt=4
>       pending: 3*srcu_invoke_callbacks
> 
> Not sure if replacing mask = ~0 with something derived from
> cpu_online_mask would be racy in that context.
> 
> [1] https://lore.kernel.org/rcu/acRho9L4zA2MRuxc@tardis.local
> [2] https://lore.kernel.org/rcu/fe28d664-3872-40f6-83c6-818627ad5b7d@paulmck-laptop

First, thank you for the bug report and apologies for the hassle!
This was a pre-existing bug, but the change made it much more likely
to happen.

Does the alleged (and untested) fix below do the trick?  The theory is
that if a given CPU has ever been fully online, it has workqueues set up.
Directly checking whether a CPU is currently online is vulnerable to a CPU
piling up lots of SRCU callbacks, then going offline.  So we do need to
be prepared to invoke SRCU callbacks for CPUs that are currently offline.

In the meantime, I will start up some tests.  Not that they saw the
bug in the first place, so it is your tests that matter here.

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 0d01cd8c4b4a7..e68ee7f69e1fc 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -898,7 +898,7 @@ static void srcu_schedule_cbs_snp(struct srcu_struct *ssp, struct srcu_node *snp
 	int cpu;
 
 	for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
-		if (!(mask & (1UL << (cpu - snp->grplo))))
+		if (!(mask & (1UL << (cpu - snp->grplo))) || !rcu_cpu_beenfullyonline(cpu))
 			continue;
 		srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
 	}

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
  2026-04-09 17:22 ` Paul E. McKenney
@ 2026-04-09 19:15   ` Vasily Gorbik
  2026-04-09 20:10     ` Paul E. McKenney
  0 siblings, 1 reply; 32+ messages in thread
From: Vasily Gorbik @ 2026-04-09 19:15 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Boqun Feng, Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Uladzislau Rezki, rcu, linux-kernel, linux-s390

On Thu, Apr 09, 2026 at 10:22:00AM -0700, Paul E. McKenney wrote:
> On Thu, Apr 09, 2026 at 03:08:45PM +0200, Vasily Gorbik wrote:
> > Commit 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
> > non-preemptible") defers srcu_node tree allocation when called under
> > raw spinlock, putting SRCU through ~6 transitional grace periods
> > (SRCU_SIZE_ALLOC to SRCU_SIZE_BIG). During this transition srcu_gp_end()
> > uses mask = ~0, which makes srcu_schedule_cbs_snp() call queue_work_on()
> > for every possible CPU. Since rcu_gp_wq is WQ_PERCPU, work targets
> > per-CPU pools directly - pools for not-online CPUs have no workers,
> > work accumulates, workqueue lockup detector fires.
> > 
> > Before 61bbcfb50514, GFP_ATOMIC allocation went straight to
> > SRCU_SIZE_BIG, the mask = ~0 path was never reached.
> > 
> > Affects systems with convert_to_big active (auto when nr_cpu_ids >= 128)
> > and possible CPUs > online CPUs. Hit on s390 LPAR (76 online, 400 possible),
> > where possible CPUs > online CPUs is the usual case.
> > Also reproducible on x86 KVM --smp 16,maxcpus=255 (CONFIG_NR_CPUS=256)
> > or simply -smp 1,maxcpus=2 with srcutree.convert_to_big=1
> > or --smp 16,maxcpus=64 with srcutree.big_cpu_lim=32 (CONFIG_NR_CPUS=64)
> > 
> > s390 log (76 online CPUs, 400 possible, all pools 76-399 stuck):
> > 
> >   BUG: workqueue lockup - pool cpus=76 node=0 flags=0x4 nice=0 stuck for 1842s!
> >   BUG: workqueue lockup - pool cpus=77 node=0 flags=0x4 nice=0 stuck for 1842s!
> >   ...
> >   BUG: workqueue lockup - pool cpus=399 node=0 flags=0x4 nice=0 stuck for 1842s!
> >   Showing busy workqueues and worker pools:
> >   workqueue rcu_gp: flags=0x108
> >     pwq 306: cpus=76 node=0 flags=0x4 nice=0 active=3 refcnt=4
> >       pending: 3*srcu_invoke_callbacks
> >     pwq 310: cpus=77 node=0 flags=0x4 nice=0 active=3 refcnt=4
> >       pending: 3*srcu_invoke_callbacks
> >     ...
> >     pwq 1598: cpus=399 node=0 flags=0x4 nice=0 active=3 refcnt=4
> >       pending: 3*srcu_invoke_callbacks
> > 
> > Not sure if replacing mask = ~0 with something derived from
> > cpu_online_mask would be racy in that context.
> > 
> > [1] https://lore.kernel.org/rcu/acRho9L4zA2MRuxc@tardis.local
> > [2] https://lore.kernel.org/rcu/fe28d664-3872-40f6-83c6-818627ad5b7d@paulmck-laptop
> 
> This was a pre-existing bug, but the change made it much more likely
> to happen.

Yes, indeed.

> Does the alleged (and untested) fix below do the trick?  The theory is
> that if a given CPU has ever been fully online, it has workqueues set up.
> Directly checking whether a CPU is currently online is vulnerable to a CPU
> piling up lots of SRCU callbacks, then going offline.  So we do need to
> be prepared to invoke SRCU callbacks for CPUs that are currently offline.

Yes, tested on s390 LPAR (76 online, 400 possible) as well as
on x86 KVM with --smp 16,maxcpus=255 and CONFIG_NR_CPUS=256
no more workqueue lockup in both cases.

Thank you!

Tested-by: Vasily Gorbik <gor@linux.ibm.com>

> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> index 0d01cd8c4b4a7..e68ee7f69e1fc 100644
> --- a/kernel/rcu/srcutree.c
> +++ b/kernel/rcu/srcutree.c
> @@ -898,7 +898,7 @@ static void srcu_schedule_cbs_snp(struct srcu_struct *ssp, struct srcu_node *snp
>  	int cpu;
>  
>  	for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
> -		if (!(mask & (1UL << (cpu - snp->grplo))))
> +		if (!(mask & (1UL << (cpu - snp->grplo))) || !rcu_cpu_beenfullyonline(cpu))
>  			continue;
>  		srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
>  	}

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
  2026-04-09 19:15   ` Vasily Gorbik
@ 2026-04-09 20:10     ` Paul E. McKenney
  2026-04-10  4:03       ` Paul E. McKenney
  0 siblings, 1 reply; 32+ messages in thread
From: Paul E. McKenney @ 2026-04-09 20:10 UTC (permalink / raw)
  To: Vasily Gorbik
  Cc: Boqun Feng, Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Uladzislau Rezki, rcu, linux-kernel, linux-s390

On Thu, Apr 09, 2026 at 09:15:50PM +0200, Vasily Gorbik wrote:
> On Thu, Apr 09, 2026 at 10:22:00AM -0700, Paul E. McKenney wrote:
> > On Thu, Apr 09, 2026 at 03:08:45PM +0200, Vasily Gorbik wrote:
> > > Commit 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
> > > non-preemptible") defers srcu_node tree allocation when called under
> > > raw spinlock, putting SRCU through ~6 transitional grace periods
> > > (SRCU_SIZE_ALLOC to SRCU_SIZE_BIG). During this transition srcu_gp_end()
> > > uses mask = ~0, which makes srcu_schedule_cbs_snp() call queue_work_on()
> > > for every possible CPU. Since rcu_gp_wq is WQ_PERCPU, work targets
> > > per-CPU pools directly - pools for not-online CPUs have no workers,
> > > work accumulates, workqueue lockup detector fires.
> > > 
> > > Before 61bbcfb50514, GFP_ATOMIC allocation went straight to
> > > SRCU_SIZE_BIG, the mask = ~0 path was never reached.
> > > 
> > > Affects systems with convert_to_big active (auto when nr_cpu_ids >= 128)
> > > and possible CPUs > online CPUs. Hit on s390 LPAR (76 online, 400 possible),
> > > where possible CPUs > online CPUs is the usual case.
> > > Also reproducible on x86 KVM --smp 16,maxcpus=255 (CONFIG_NR_CPUS=256)
> > > or simply -smp 1,maxcpus=2 with srcutree.convert_to_big=1
> > > or --smp 16,maxcpus=64 with srcutree.big_cpu_lim=32 (CONFIG_NR_CPUS=64)
> > > 
> > > s390 log (76 online CPUs, 400 possible, all pools 76-399 stuck):
> > > 
> > >   BUG: workqueue lockup - pool cpus=76 node=0 flags=0x4 nice=0 stuck for 1842s!
> > >   BUG: workqueue lockup - pool cpus=77 node=0 flags=0x4 nice=0 stuck for 1842s!
> > >   ...
> > >   BUG: workqueue lockup - pool cpus=399 node=0 flags=0x4 nice=0 stuck for 1842s!
> > >   Showing busy workqueues and worker pools:
> > >   workqueue rcu_gp: flags=0x108
> > >     pwq 306: cpus=76 node=0 flags=0x4 nice=0 active=3 refcnt=4
> > >       pending: 3*srcu_invoke_callbacks
> > >     pwq 310: cpus=77 node=0 flags=0x4 nice=0 active=3 refcnt=4
> > >       pending: 3*srcu_invoke_callbacks
> > >     ...
> > >     pwq 1598: cpus=399 node=0 flags=0x4 nice=0 active=3 refcnt=4
> > >       pending: 3*srcu_invoke_callbacks
> > > 
> > > Not sure if replacing mask = ~0 with something derived from
> > > cpu_online_mask would be racy in that context.
> > > 
> > > [1] https://lore.kernel.org/rcu/acRho9L4zA2MRuxc@tardis.local
> > > [2] https://lore.kernel.org/rcu/fe28d664-3872-40f6-83c6-818627ad5b7d@paulmck-laptop
> > 
> > This was a pre-existing bug, but the change made it much more likely
> > to happen.
> 
> Yes, indeed.
> 
> > Does the alleged (and untested) fix below do the trick?  The theory is
> > that if a given CPU has ever been fully online, it has workqueues set up.
> > Directly checking whether a CPU is currently online is vulnerable to a CPU
> > piling up lots of SRCU callbacks, then going offline.  So we do need to
> > be prepared to invoke SRCU callbacks for CPUs that are currently offline.
> 
> Yes, tested on s390 LPAR (76 online, 400 possible) as well as
> on x86 KVM with --smp 16,maxcpus=255 and CONFIG_NR_CPUS=256
> no more workqueue lockup in both cases.
> 
> Thank you!
> 
> Tested-by: Vasily Gorbik <gor@linux.ibm.com>

Thank you for testing this!

Please see below for an updated patch.  Tejun's patch might obsolete
this one, but just in case he balks at SRCU queueing handlers for CPUs
that are not even in the cpu_possible_mask.  ;-)

							Thanx, Paul

------------------------------------------------------------------------

commit dcc14db7e76af899f1ff4606ec4316580d7b6f88
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Thu Apr 9 11:16:02 2026 -0700

    srcu: Don't queue workqueue handlers to never-online CPUs
    
    While an srcu_struct structure is in the midst of switching from CPU-0
    to all-CPUs state, it can attempt to invoke callbacks for CPUs that
    have never been online.  Worse yet, it can attempt in invoke callbacks
    for CPUs that never will be online due to not being present in the
    cpu_possible_mask.  This can cause hangs on s390, which is not set up to
    deal with workqueue handlers being scheduled on such CPUs.  This commit
    therefore causes Tree SRCU to refrain from queueing workqueue handlers
    on CPUs that have not yet (and might never) come online.
    
    Reported-by: Vasily Gorbik <gor@linux.ibm.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Tested-by: Vasily Gorbik <gor@linux.ibm.com>
    Cc: Tejun Heo <tj@kernel.org>

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 0d01cd8c4b4a7..a67af44fc0745 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -897,11 +897,9 @@ static void srcu_schedule_cbs_snp(struct srcu_struct *ssp, struct srcu_node *snp
 {
 	int cpu;
 
-	for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
-		if (!(mask & (1UL << (cpu - snp->grplo))))
-			continue;
-		srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
-	}
+	for (cpu = snp->grplo; cpu <= snp->grphi; cpu++)
+		if ((mask & (1UL << (cpu - snp->grplo))) && rcu_cpu_beenfullyonline(cpu))
+			srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
 }
 
 /*

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
  2026-04-09 20:10     ` Paul E. McKenney
@ 2026-04-10  4:03       ` Paul E. McKenney
  2026-04-14 19:24         ` Paul E. McKenney
  0 siblings, 1 reply; 32+ messages in thread
From: Paul E. McKenney @ 2026-04-10  4:03 UTC (permalink / raw)
  To: Vasily Gorbik
  Cc: Boqun Feng, Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Uladzislau Rezki, rcu, linux-kernel, linux-s390

On Thu, Apr 09, 2026 at 01:10:14PM -0700, Paul E. McKenney wrote:
> On Thu, Apr 09, 2026 at 09:15:50PM +0200, Vasily Gorbik wrote:

[ . . . ]

> > Yes, tested on s390 LPAR (76 online, 400 possible) as well as
> > on x86 KVM with --smp 16,maxcpus=255 and CONFIG_NR_CPUS=256
> > no more workqueue lockup in both cases.
> > 
> > Thank you!
> > 
> > Tested-by: Vasily Gorbik <gor@linux.ibm.com>
> 
> Thank you for testing this!
> 
> Please see below for an updated patch.  Tejun's patch might obsolete
> this one, but just in case he balks at SRCU queueing handlers for CPUs
> that are not even in the cpu_possible_mask.  ;-)

And because we don't invoke SRCU callbacks on CPUs that are not yet fully
online, such CPUs had better not invoke call_srcu(), synchronize_srcu(),
or synchronize_srcu_expedited() on a CPU that is not yet fully online.
I am therefore adding the warning shown below.

Better paranoid late than paranoid not at all.  ;-)

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index a67af44fc0745..d62509efb52f5 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -1431,6 +1431,7 @@ static unsigned long srcu_gp_start_if_needed(struct srcu_struct *ssp,
 static void __call_srcu(struct srcu_struct *ssp, struct rcu_head *rhp,
 			rcu_callback_t func, bool do_norm)
 {
+	WARN_ON_ONCE(!rcu_cpu_beenfullyonline(raw_smp_processor_id()));
 	if (debug_rcu_head_queue(rhp)) {
 		/* Probable double call_srcu(), so leak the callback. */
 		WRITE_ONCE(rhp->func, srcu_leak_callback);

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
  2026-04-10  4:03       ` Paul E. McKenney
@ 2026-04-14 19:24         ` Paul E. McKenney
  2026-04-29 17:50           ` Vasily Gorbik
  0 siblings, 1 reply; 32+ messages in thread
From: Paul E. McKenney @ 2026-04-14 19:24 UTC (permalink / raw)
  To: Vasily Gorbik
  Cc: Boqun Feng, Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Uladzislau Rezki, rcu, linux-kernel, linux-s390

On Thu, Apr 09, 2026 at 09:03:26PM -0700, Paul E. McKenney wrote:
> On Thu, Apr 09, 2026 at 01:10:14PM -0700, Paul E. McKenney wrote:
> > On Thu, Apr 09, 2026 at 09:15:50PM +0200, Vasily Gorbik wrote:
> 
> [ . . . ]
> 
> > > Yes, tested on s390 LPAR (76 online, 400 possible) as well as
> > > on x86 KVM with --smp 16,maxcpus=255 and CONFIG_NR_CPUS=256
> > > no more workqueue lockup in both cases.
> > > 
> > > Thank you!
> > > 
> > > Tested-by: Vasily Gorbik <gor@linux.ibm.com>
> > 
> > Thank you for testing this!
> > 
> > Please see below for an updated patch.  Tejun's patch might obsolete
> > this one, but just in case he balks at SRCU queueing handlers for CPUs
> > that are not even in the cpu_possible_mask.  ;-)
> 
> And because we don't invoke SRCU callbacks on CPUs that are not yet fully
> online, such CPUs had better not invoke call_srcu(), synchronize_srcu(),
> or synchronize_srcu_expedited() on a CPU that is not yet fully online.
> I am therefore adding the warning shown below.
> 
> Better paranoid late than paranoid not at all.  ;-)

Except that to make that check actually work, additional code is needed.
So much so that it is easier to make call_srcu() from not-yet-onlined
CPUs work properly, the trick being to queue the callback onto the boot
CPU's callback queue.  That way, there is no need to invoke callbacks
queued on CPUs that cannot yet run workqueue handlers.

Please see below for the full patch, including refraining from queueing
workqueue handlers on not-yet-online CPUs and diverting SRCU callbacks
from not-yet-fully-online CPUs to the boot CPU's callback queue.

Thoughts?

							Thanx, Paul

------------------------------------------------------------------------

commit ce533a60b2ef29a9b516cc717e77c6b679bc09c0
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Thu Apr 9 11:16:02 2026 -0700

    srcu: Don't queue workqueue handlers to never-online CPUs
    
    While an srcu_struct structure is in the midst of switching from CPU-0
    to all-CPUs state, it can attempt to invoke callbacks for CPUs that
    have never been online.  Worse yet, it can attempt in invoke callbacks
    for CPUs that never will be online due to not being present in the
    cpu_possible_mask.  This can cause hangs on s390, which is not set up to
    deal with workqueue handlers being scheduled on such CPUs.  This commit
    therefore causes Tree SRCU to refrain from queueing workqueue handlers
    on CPUs that have not yet (and might never) come online.
    
    Because callbacks are not invoked on CPUs that have not been
    online, it is an error to invoke call_srcu(), synchronize_srcu(), or
    synchronize_srcu_expedited() on a CPU that is not yet fully online.
    However, it turns out to be less code to redirect the callbacks
    from too-early invocations of call_srcu() than to warn about such
    invocations.  This commit therefore also redirects callbacks queued on
    not-yet-fully-online CPUs to the boot CPU.
    
    Reported-by: Vasily Gorbik <gor@linux.ibm.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Tested-by: Vasily Gorbik <gor@linux.ibm.com>
    Cc: Tejun Heo <tj@kernel.org>

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 0d01cd8c4b4a7..7c2f7cc131f7a 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -897,11 +897,9 @@ static void srcu_schedule_cbs_snp(struct srcu_struct *ssp, struct srcu_node *snp
 {
 	int cpu;
 
-	for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
-		if (!(mask & (1UL << (cpu - snp->grplo))))
-			continue;
-		srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
-	}
+	for (cpu = snp->grplo; cpu <= snp->grphi; cpu++)
+		if ((mask & (1UL << (cpu - snp->grplo))) && rcu_cpu_beenfullyonline(cpu))
+			srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
 }
 
 /*
@@ -1322,7 +1320,9 @@ static unsigned long srcu_gp_start_if_needed(struct srcu_struct *ssp,
 	 */
 	idx = __srcu_read_lock_nmisafe(ssp);
 	ss_state = smp_load_acquire(&ssp->srcu_sup->srcu_size_state);
-	if (ss_state < SRCU_SIZE_WAIT_CALL)
+	// If !rcu_cpu_beenfullyonline(), interrupts are still disabled,
+	// so no migration is possible in either direction from this CPU.
+	if (ss_state < SRCU_SIZE_WAIT_CALL || !rcu_cpu_beenfullyonline(raw_smp_processor_id()))
 		sdp = per_cpu_ptr(ssp->sda, get_boot_cpu_id());
 	else
 		sdp = raw_cpu_ptr(ssp->sda);

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
  2026-04-14 19:24         ` Paul E. McKenney
@ 2026-04-29 17:50           ` Vasily Gorbik
  2026-04-29 18:05             ` Paul E. McKenney
  0 siblings, 1 reply; 32+ messages in thread
From: Vasily Gorbik @ 2026-04-29 17:50 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Boqun Feng, Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Uladzislau Rezki, rcu, linux-kernel, linux-s390, Samir M,
	Srikar Dronamraju

On Tue, Apr 14, 2026 at 12:24:12PM -0700, Paul E. McKenney wrote:
> On Thu, Apr 09, 2026 at 09:03:26PM -0700, Paul E. McKenney wrote:
> Please see below for the full patch, including refraining from queueing
> workqueue handlers on not-yet-online CPUs and diverting SRCU callbacks
> from not-yet-fully-online CPUs to the boot CPU's callback queue.
...
> commit ce533a60b2ef29a9b516cc717e77c6b679bc09c0
> Author: Paul E. McKenney <paulmck@kernel.org>
> Date:   Thu Apr 9 11:16:02 2026 -0700
> 
>     srcu: Don't queue workqueue handlers to never-online CPUs
>     
>     While an srcu_struct structure is in the midst of switching from CPU-0
>     to all-CPUs state, it can attempt to invoke callbacks for CPUs that
>     have never been online.  Worse yet, it can attempt in invoke callbacks
>     for CPUs that never will be online due to not being present in the
>     cpu_possible_mask.  This can cause hangs on s390, which is not set up to
>     deal with workqueue handlers being scheduled on such CPUs.  This commit
>     therefore causes Tree SRCU to refrain from queueing workqueue handlers
>     on CPUs that have not yet (and might never) come online.
>     
>     Because callbacks are not invoked on CPUs that have not been
>     online, it is an error to invoke call_srcu(), synchronize_srcu(), or
>     synchronize_srcu_expedited() on a CPU that is not yet fully online.
>     However, it turns out to be less code to redirect the callbacks
>     from too-early invocations of call_srcu() than to warn about such
>     invocations.  This commit therefore also redirects callbacks queued on
>     not-yet-fully-online CPUs to the boot CPU.
>     
>     Reported-by: Vasily Gorbik <gor@linux.ibm.com>
>     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
>     Tested-by: Vasily Gorbik <gor@linux.ibm.com>
>     Cc: Tejun Heo <tj@kernel.org>

I retested it on s390 and on x86 KVM with --smp 16,maxcpus=255, all
looks good to me.

FWIW, again:

Tested-by: Vasily Gorbik <gor@linux.ibm.com>

Would you mind adding Cc: stable so it gets picked up for v7.0?
61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
non-preemptible") is what made it reproducible for us.

Thank you!

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
  2026-04-29 17:50           ` Vasily Gorbik
@ 2026-04-29 18:05             ` Paul E. McKenney
  2026-04-29 18:23               ` Vasily Gorbik
  0 siblings, 1 reply; 32+ messages in thread
From: Paul E. McKenney @ 2026-04-29 18:05 UTC (permalink / raw)
  To: Vasily Gorbik
  Cc: Boqun Feng, Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Uladzislau Rezki, rcu, linux-kernel, linux-s390, Samir M,
	Srikar Dronamraju

On Wed, Apr 29, 2026 at 07:50:31PM +0200, Vasily Gorbik wrote:
> On Tue, Apr 14, 2026 at 12:24:12PM -0700, Paul E. McKenney wrote:
> > On Thu, Apr 09, 2026 at 09:03:26PM -0700, Paul E. McKenney wrote:
> > Please see below for the full patch, including refraining from queueing
> > workqueue handlers on not-yet-online CPUs and diverting SRCU callbacks
> > from not-yet-fully-online CPUs to the boot CPU's callback queue.
> ...
> > commit ce533a60b2ef29a9b516cc717e77c6b679bc09c0
> > Author: Paul E. McKenney <paulmck@kernel.org>
> > Date:   Thu Apr 9 11:16:02 2026 -0700
> > 
> >     srcu: Don't queue workqueue handlers to never-online CPUs
> >     
> >     While an srcu_struct structure is in the midst of switching from CPU-0
> >     to all-CPUs state, it can attempt to invoke callbacks for CPUs that
> >     have never been online.  Worse yet, it can attempt in invoke callbacks
> >     for CPUs that never will be online due to not being present in the
> >     cpu_possible_mask.  This can cause hangs on s390, which is not set up to
> >     deal with workqueue handlers being scheduled on such CPUs.  This commit
> >     therefore causes Tree SRCU to refrain from queueing workqueue handlers
> >     on CPUs that have not yet (and might never) come online.
> >     
> >     Because callbacks are not invoked on CPUs that have not been
> >     online, it is an error to invoke call_srcu(), synchronize_srcu(), or
> >     synchronize_srcu_expedited() on a CPU that is not yet fully online.
> >     However, it turns out to be less code to redirect the callbacks
> >     from too-early invocations of call_srcu() than to warn about such
> >     invocations.  This commit therefore also redirects callbacks queued on
> >     not-yet-fully-online CPUs to the boot CPU.
> >     
> >     Reported-by: Vasily Gorbik <gor@linux.ibm.com>
> >     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> >     Tested-by: Vasily Gorbik <gor@linux.ibm.com>
> >     Cc: Tejun Heo <tj@kernel.org>
> 
> I retested it on s390 and on x86 KVM with --smp 16,maxcpus=255, all
> looks good to me.
> 
> FWIW, again:
> 
> Tested-by: Vasily Gorbik <gor@linux.ibm.com>
> 
> Would you mind adding Cc: stable so it gets picked up for v7.0?
> 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
> non-preemptible") is what made it reproducible for us.
> 
> Thank you!

And thank you for testing it, plus apologies for the hassle!

At my next rebase, I will add the following:

Fixes: 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when non-preemptible")
Tested-by: Vasily Gorbik <gor@linux.ibm.com>

That should pull it into the needed -stable releases.

Seem reasonable?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
  2026-04-29 18:05             ` Paul E. McKenney
@ 2026-04-29 18:23               ` Vasily Gorbik
  0 siblings, 0 replies; 32+ messages in thread
From: Vasily Gorbik @ 2026-04-29 18:23 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Boqun Feng, Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Uladzislau Rezki, rcu, linux-kernel, linux-s390, Samir M,
	Srikar Dronamraju

On Wed, Apr 29, 2026 at 11:05:02AM -0700, Paul E. McKenney wrote:
> On Wed, Apr 29, 2026 at 07:50:31PM +0200, Vasily Gorbik wrote:
> > On Tue, Apr 14, 2026 at 12:24:12PM -0700, Paul E. McKenney wrote:
> > > On Thu, Apr 09, 2026 at 09:03:26PM -0700, Paul E. McKenney wrote:
> > > Please see below for the full patch, including refraining from queueing
> > > workqueue handlers on not-yet-online CPUs and diverting SRCU callbacks
> > > from not-yet-fully-online CPUs to the boot CPU's callback queue.
> > ...
> > > commit ce533a60b2ef29a9b516cc717e77c6b679bc09c0
> > > Author: Paul E. McKenney <paulmck@kernel.org>
> > > Date:   Thu Apr 9 11:16:02 2026 -0700
> > > 
> > >     srcu: Don't queue workqueue handlers to never-online CPUs
> > >     
...
> > >     
> > >     Reported-by: Vasily Gorbik <gor@linux.ibm.com>
> > >     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > >     Tested-by: Vasily Gorbik <gor@linux.ibm.com>
> > >     Cc: Tejun Heo <tj@kernel.org>
> > 
> > I retested it on s390 and on x86 KVM with --smp 16,maxcpus=255, all
> > looks good to me.
> > 
> > FWIW, again:
> > 
> > Tested-by: Vasily Gorbik <gor@linux.ibm.com>
> > 
> > Would you mind adding Cc: stable so it gets picked up for v7.0?
> > 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
> > non-preemptible") is what made it reproducible for us.
> > 
> > Thank you!
> 
> And thank you for testing it, plus apologies for the hassle!
> 
> At my next rebase, I will add the following:
> 
> Fixes: 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when non-preemptible")
> Tested-by: Vasily Gorbik <gor@linux.ibm.com>
> 
> That should pull it into the needed -stable releases.
> 
> Seem reasonable?

Perfect, thanks Paul!

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
  2026-04-09 13:08 BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition Vasily Gorbik
  2026-04-09 17:22 ` Paul E. McKenney
@ 2026-04-09 17:26 ` Boqun Feng
  2026-04-09 17:40   ` Boqun Feng
  1 sibling, 1 reply; 32+ messages in thread
From: Boqun Feng @ 2026-04-09 17:26 UTC (permalink / raw)
  To: Vasily Gorbik
  Cc: Paul E. McKenney, Frederic Weisbecker, Neeraj Upadhyay,
	Joel Fernandes, Uladzislau Rezki, rcu, linux-kernel, linux-s390,
	Tejun Heo, Lai Jiangshan

On Thu, Apr 09, 2026 at 03:08:45PM +0200, Vasily Gorbik wrote:
> Commit 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
> non-preemptible") defers srcu_node tree allocation when called under
> raw spinlock, putting SRCU through ~6 transitional grace periods
> (SRCU_SIZE_ALLOC to SRCU_SIZE_BIG). During this transition srcu_gp_end()
> uses mask = ~0, which makes srcu_schedule_cbs_snp() call queue_work_on()
> for every possible CPU. Since rcu_gp_wq is WQ_PERCPU, work targets
> per-CPU pools directly - pools for not-online CPUs have no workers,

[Cc workqueue]

Hmm.. I thought for offline CPUs the corresponding worker pools become a
unbound one hence there are still workers?

Regards,
Boqun

> work accumulates, workqueue lockup detector fires.
> 
> Before 61bbcfb50514, GFP_ATOMIC allocation went straight to
> SRCU_SIZE_BIG, the mask = ~0 path was never reached.
> 
> Affects systems with convert_to_big active (auto when nr_cpu_ids >= 128)
> and possible CPUs > online CPUs. Hit on s390 LPAR (76 online, 400 possible),
> where possible CPUs > online CPUs is the usual case.
> Also reproducible on x86 KVM --smp 16,maxcpus=255 (CONFIG_NR_CPUS=256)
> or simply -smp 1,maxcpus=2 with srcutree.convert_to_big=1
> or --smp 16,maxcpus=64 with srcutree.big_cpu_lim=32 (CONFIG_NR_CPUS=64)
> 
> s390 log (76 online CPUs, 400 possible, all pools 76-399 stuck):
> 
>   BUG: workqueue lockup - pool cpus=76 node=0 flags=0x4 nice=0 stuck for 1842s!
>   BUG: workqueue lockup - pool cpus=77 node=0 flags=0x4 nice=0 stuck for 1842s!
>   ...
>   BUG: workqueue lockup - pool cpus=399 node=0 flags=0x4 nice=0 stuck for 1842s!
>   Showing busy workqueues and worker pools:
>   workqueue rcu_gp: flags=0x108
>     pwq 306: cpus=76 node=0 flags=0x4 nice=0 active=3 refcnt=4
>       pending: 3*srcu_invoke_callbacks
>     pwq 310: cpus=77 node=0 flags=0x4 nice=0 active=3 refcnt=4
>       pending: 3*srcu_invoke_callbacks
>     ...
>     pwq 1598: cpus=399 node=0 flags=0x4 nice=0 active=3 refcnt=4
>       pending: 3*srcu_invoke_callbacks
> 
> Not sure if replacing mask = ~0 with something derived from
> cpu_online_mask would be racy in that context.
> 
> [1] https://lore.kernel.org/rcu/acRho9L4zA2MRuxc@tardis.local
> [2] https://lore.kernel.org/rcu/fe28d664-3872-40f6-83c6-818627ad5b7d@paulmck-laptop

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
  2026-04-09 17:26 ` Boqun Feng
@ 2026-04-09 17:40   ` Boqun Feng
  2026-04-09 17:47     ` Tejun Heo
  0 siblings, 1 reply; 32+ messages in thread
From: Boqun Feng @ 2026-04-09 17:40 UTC (permalink / raw)
  To: Vasily Gorbik
  Cc: Paul E. McKenney, Frederic Weisbecker, Neeraj Upadhyay,
	Joel Fernandes, Uladzislau Rezki, rcu, linux-kernel, linux-s390,
	Tejun Heo, Lai Jiangshan

On Thu, Apr 09, 2026 at 10:26:49AM -0700, Boqun Feng wrote:
> On Thu, Apr 09, 2026 at 03:08:45PM +0200, Vasily Gorbik wrote:
> > Commit 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
> > non-preemptible") defers srcu_node tree allocation when called under
> > raw spinlock, putting SRCU through ~6 transitional grace periods
> > (SRCU_SIZE_ALLOC to SRCU_SIZE_BIG). During this transition srcu_gp_end()
> > uses mask = ~0, which makes srcu_schedule_cbs_snp() call queue_work_on()
> > for every possible CPU. Since rcu_gp_wq is WQ_PERCPU, work targets
> > per-CPU pools directly - pools for not-online CPUs have no workers,
> 
> [Cc workqueue]
> 
> Hmm.. I thought for offline CPUs the corresponding worker pools become a
> unbound one hence there are still workers?
> 

Ah, as Paul replied in another email, the problem was because these CPUs
had never been onlined, so they don't even have unbound workers?

Regards,
Boqun

> Regards,
> Boqun
> 
> > work accumulates, workqueue lockup detector fires.
> > 
> > Before 61bbcfb50514, GFP_ATOMIC allocation went straight to
> > SRCU_SIZE_BIG, the mask = ~0 path was never reached.
> > 
> > Affects systems with convert_to_big active (auto when nr_cpu_ids >= 128)
> > and possible CPUs > online CPUs. Hit on s390 LPAR (76 online, 400 possible),
> > where possible CPUs > online CPUs is the usual case.
> > Also reproducible on x86 KVM --smp 16,maxcpus=255 (CONFIG_NR_CPUS=256)
> > or simply -smp 1,maxcpus=2 with srcutree.convert_to_big=1
> > or --smp 16,maxcpus=64 with srcutree.big_cpu_lim=32 (CONFIG_NR_CPUS=64)
> > 
> > s390 log (76 online CPUs, 400 possible, all pools 76-399 stuck):
> > 
> >   BUG: workqueue lockup - pool cpus=76 node=0 flags=0x4 nice=0 stuck for 1842s!
> >   BUG: workqueue lockup - pool cpus=77 node=0 flags=0x4 nice=0 stuck for 1842s!
> >   ...
> >   BUG: workqueue lockup - pool cpus=399 node=0 flags=0x4 nice=0 stuck for 1842s!
> >   Showing busy workqueues and worker pools:
> >   workqueue rcu_gp: flags=0x108
> >     pwq 306: cpus=76 node=0 flags=0x4 nice=0 active=3 refcnt=4
> >       pending: 3*srcu_invoke_callbacks
> >     pwq 310: cpus=77 node=0 flags=0x4 nice=0 active=3 refcnt=4
> >       pending: 3*srcu_invoke_callbacks
> >     ...
> >     pwq 1598: cpus=399 node=0 flags=0x4 nice=0 active=3 refcnt=4
> >       pending: 3*srcu_invoke_callbacks
> > 
> > Not sure if replacing mask = ~0 with something derived from
> > cpu_online_mask would be racy in that context.
> > 
> > [1] https://lore.kernel.org/rcu/acRho9L4zA2MRuxc@tardis.local
> > [2] https://lore.kernel.org/rcu/fe28d664-3872-40f6-83c6-818627ad5b7d@paulmck-laptop

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
  2026-04-09 17:40   ` Boqun Feng
@ 2026-04-09 17:47     ` Tejun Heo
  2026-04-09 17:48       ` Tejun Heo
  2026-04-09 18:10       ` Boqun Feng
  0 siblings, 2 replies; 32+ messages in thread
From: Tejun Heo @ 2026-04-09 17:47 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Vasily Gorbik, Paul E. McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Joel Fernandes, Uladzislau Rezki, rcu,
	linux-kernel, linux-s390, Lai Jiangshan

On Thu, Apr 09, 2026 at 10:40:05AM -0700, Boqun Feng wrote:
> On Thu, Apr 09, 2026 at 10:26:49AM -0700, Boqun Feng wrote:
> > On Thu, Apr 09, 2026 at 03:08:45PM +0200, Vasily Gorbik wrote:
> > > Commit 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
> > > non-preemptible") defers srcu_node tree allocation when called under
> > > raw spinlock, putting SRCU through ~6 transitional grace periods
> > > (SRCU_SIZE_ALLOC to SRCU_SIZE_BIG). During this transition srcu_gp_end()
> > > uses mask = ~0, which makes srcu_schedule_cbs_snp() call queue_work_on()
> > > for every possible CPU. Since rcu_gp_wq is WQ_PERCPU, work targets
> > > per-CPU pools directly - pools for not-online CPUs have no workers,
> > 
> > [Cc workqueue]
> > 
> > Hmm.. I thought for offline CPUs the corresponding worker pools become a
> > unbound one hence there are still workers?
> > 
> 
> Ah, as Paul replied in another email, the problem was because these CPUs
> had never been onlined, so they don't even have unbound workers?

Hahaha, we do initialize worker pool for every possible CPU but the
transition to unbound operation happens in the hot unplug callback. We
probably need to do some of the hot unplug operation during init if the CPU
is possible but not online. That said, what kind of machine is it? Is the
firmware just reporting bogus possible mask? How come the CPUs weren't
online during boot?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
  2026-04-09 17:47     ` Tejun Heo
@ 2026-04-09 17:48       ` Tejun Heo
  2026-04-09 18:04         ` Paul E. McKenney
  2026-04-09 18:10       ` Boqun Feng
  1 sibling, 1 reply; 32+ messages in thread
From: Tejun Heo @ 2026-04-09 17:48 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Vasily Gorbik, Paul E. McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Joel Fernandes, Uladzislau Rezki, rcu,
	linux-kernel, linux-s390, Lai Jiangshan

On Thu, Apr 09, 2026 at 07:47:09AM -1000, Tejun Heo wrote:
> On Thu, Apr 09, 2026 at 10:40:05AM -0700, Boqun Feng wrote:
> > On Thu, Apr 09, 2026 at 10:26:49AM -0700, Boqun Feng wrote:
> > > On Thu, Apr 09, 2026 at 03:08:45PM +0200, Vasily Gorbik wrote:
> > > > Commit 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
> > > > non-preemptible") defers srcu_node tree allocation when called under
> > > > raw spinlock, putting SRCU through ~6 transitional grace periods
> > > > (SRCU_SIZE_ALLOC to SRCU_SIZE_BIG). During this transition srcu_gp_end()
> > > > uses mask = ~0, which makes srcu_schedule_cbs_snp() call queue_work_on()
> > > > for every possible CPU. Since rcu_gp_wq is WQ_PERCPU, work targets
> > > > per-CPU pools directly - pools for not-online CPUs have no workers,
> > > 
> > > [Cc workqueue]
> > > 
> > > Hmm.. I thought for offline CPUs the corresponding worker pools become a
> > > unbound one hence there are still workers?
> > > 
> > 
> > Ah, as Paul replied in another email, the problem was because these CPUs
> > had never been onlined, so they don't even have unbound workers?
> 
> Hahaha, we do initialize worker pool for every possible CPU but the
> transition to unbound operation happens in the hot unplug callback. We
> probably need to do some of the hot unplug operation during init if the CPU
> is possible but not online. That said, what kind of machine is it? Is the
> firmware just reporting bogus possible mask? How come the CPUs weren't
> online during boot?

Just saw ibm on the cc list. Guess this was on s390?

-- 
tejun

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
  2026-04-09 17:48       ` Tejun Heo
@ 2026-04-09 18:04         ` Paul E. McKenney
  2026-04-09 18:09           ` Tejun Heo
  0 siblings, 1 reply; 32+ messages in thread
From: Paul E. McKenney @ 2026-04-09 18:04 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Boqun Feng, Vasily Gorbik, Frederic Weisbecker, Neeraj Upadhyay,
	Joel Fernandes, Uladzislau Rezki, rcu, linux-kernel, linux-s390,
	Lai Jiangshan

On Thu, Apr 09, 2026 at 07:48:28AM -1000, Tejun Heo wrote:
> On Thu, Apr 09, 2026 at 07:47:09AM -1000, Tejun Heo wrote:
> > On Thu, Apr 09, 2026 at 10:40:05AM -0700, Boqun Feng wrote:
> > > On Thu, Apr 09, 2026 at 10:26:49AM -0700, Boqun Feng wrote:
> > > > On Thu, Apr 09, 2026 at 03:08:45PM +0200, Vasily Gorbik wrote:
> > > > > Commit 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
> > > > > non-preemptible") defers srcu_node tree allocation when called under
> > > > > raw spinlock, putting SRCU through ~6 transitional grace periods
> > > > > (SRCU_SIZE_ALLOC to SRCU_SIZE_BIG). During this transition srcu_gp_end()
> > > > > uses mask = ~0, which makes srcu_schedule_cbs_snp() call queue_work_on()
> > > > > for every possible CPU. Since rcu_gp_wq is WQ_PERCPU, work targets
> > > > > per-CPU pools directly - pools for not-online CPUs have no workers,
> > > > 
> > > > [Cc workqueue]
> > > > 
> > > > Hmm.. I thought for offline CPUs the corresponding worker pools become a
> > > > unbound one hence there are still workers?
> > > > 
> > > 
> > > Ah, as Paul replied in another email, the problem was because these CPUs
> > > had never been onlined, so they don't even have unbound workers?
> > 
> > Hahaha, we do initialize worker pool for every possible CPU but the
> > transition to unbound operation happens in the hot unplug callback. We
> > probably need to do some of the hot unplug operation during init if the CPU
> > is possible but not online. That said, what kind of machine is it? Is the
> > firmware just reporting bogus possible mask? How come the CPUs weren't
> > online during boot?
> 
> Just saw ibm on the cc list. Guess this was on s390?

It was indeed.  What workqueue tricks does s390 play?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
  2026-04-09 18:04         ` Paul E. McKenney
@ 2026-04-09 18:09           ` Tejun Heo
  2026-04-09 18:15             ` Paul E. McKenney
  0 siblings, 1 reply; 32+ messages in thread
From: Tejun Heo @ 2026-04-09 18:09 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Boqun Feng, Vasily Gorbik, Frederic Weisbecker, Neeraj Upadhyay,
	Joel Fernandes, Uladzislau Rezki, rcu, linux-kernel, linux-s390,
	Lai Jiangshan

On Thu, Apr 09, 2026 at 11:04:09AM -0700, Paul E. McKenney wrote:
> On Thu, Apr 09, 2026 at 07:48:28AM -1000, Tejun Heo wrote:
> > On Thu, Apr 09, 2026 at 07:47:09AM -1000, Tejun Heo wrote:
> > > On Thu, Apr 09, 2026 at 10:40:05AM -0700, Boqun Feng wrote:
> > > > On Thu, Apr 09, 2026 at 10:26:49AM -0700, Boqun Feng wrote:
> > > > > On Thu, Apr 09, 2026 at 03:08:45PM +0200, Vasily Gorbik wrote:
> > > > > > Commit 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
> > > > > > non-preemptible") defers srcu_node tree allocation when called under
> > > > > > raw spinlock, putting SRCU through ~6 transitional grace periods
> > > > > > (SRCU_SIZE_ALLOC to SRCU_SIZE_BIG). During this transition srcu_gp_end()
> > > > > > uses mask = ~0, which makes srcu_schedule_cbs_snp() call queue_work_on()
> > > > > > for every possible CPU. Since rcu_gp_wq is WQ_PERCPU, work targets
> > > > > > per-CPU pools directly - pools for not-online CPUs have no workers,
> > > > > 
> > > > > [Cc workqueue]
> > > > > 
> > > > > Hmm.. I thought for offline CPUs the corresponding worker pools become a
> > > > > unbound one hence there are still workers?
> > > > > 
> > > > 
> > > > Ah, as Paul replied in another email, the problem was because these CPUs
> > > > had never been onlined, so they don't even have unbound workers?
> > > 
> > > Hahaha, we do initialize worker pool for every possible CPU but the
> > > transition to unbound operation happens in the hot unplug callback. We
> > > probably need to do some of the hot unplug operation during init if the CPU
> > > is possible but not online. That said, what kind of machine is it? Is the
> > > firmware just reporting bogus possible mask? How come the CPUs weren't
> > > online during boot?
> > 
> > Just saw ibm on the cc list. Guess this was on s390?
> 
> It was indeed.  What workqueue tricks does s390 play?

They just come up with genuinely possible but offline CPUs. Most setups
don't do that. I'll spin up a patch later today.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
  2026-04-09 18:09           ` Tejun Heo
@ 2026-04-09 18:15             ` Paul E. McKenney
  0 siblings, 0 replies; 32+ messages in thread
From: Paul E. McKenney @ 2026-04-09 18:15 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Boqun Feng, Vasily Gorbik, Frederic Weisbecker, Neeraj Upadhyay,
	Joel Fernandes, Uladzislau Rezki, rcu, linux-kernel, linux-s390,
	Lai Jiangshan

On Thu, Apr 09, 2026 at 08:09:46AM -1000, Tejun Heo wrote:
> On Thu, Apr 09, 2026 at 11:04:09AM -0700, Paul E. McKenney wrote:
> > On Thu, Apr 09, 2026 at 07:48:28AM -1000, Tejun Heo wrote:
> > > On Thu, Apr 09, 2026 at 07:47:09AM -1000, Tejun Heo wrote:
> > > > On Thu, Apr 09, 2026 at 10:40:05AM -0700, Boqun Feng wrote:
> > > > > On Thu, Apr 09, 2026 at 10:26:49AM -0700, Boqun Feng wrote:
> > > > > > On Thu, Apr 09, 2026 at 03:08:45PM +0200, Vasily Gorbik wrote:
> > > > > > > Commit 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
> > > > > > > non-preemptible") defers srcu_node tree allocation when called under
> > > > > > > raw spinlock, putting SRCU through ~6 transitional grace periods
> > > > > > > (SRCU_SIZE_ALLOC to SRCU_SIZE_BIG). During this transition srcu_gp_end()
> > > > > > > uses mask = ~0, which makes srcu_schedule_cbs_snp() call queue_work_on()
> > > > > > > for every possible CPU. Since rcu_gp_wq is WQ_PERCPU, work targets
> > > > > > > per-CPU pools directly - pools for not-online CPUs have no workers,
> > > > > > 
> > > > > > [Cc workqueue]
> > > > > > 
> > > > > > Hmm.. I thought for offline CPUs the corresponding worker pools become a
> > > > > > unbound one hence there are still workers?
> > > > > > 
> > > > > 
> > > > > Ah, as Paul replied in another email, the problem was because these CPUs
> > > > > had never been onlined, so they don't even have unbound workers?
> > > > 
> > > > Hahaha, we do initialize worker pool for every possible CPU but the
> > > > transition to unbound operation happens in the hot unplug callback. We
> > > > probably need to do some of the hot unplug operation during init if the CPU
> > > > is possible but not online. That said, what kind of machine is it? Is the
> > > > firmware just reporting bogus possible mask? How come the CPUs weren't
> > > > online during boot?
> > > 
> > > Just saw ibm on the cc list. Guess this was on s390?
> > 
> > It was indeed.  What workqueue tricks does s390 play?
> 
> They just come up with genuinely possible but offline CPUs. Most setups
> don't do that. I'll spin up a patch later today.

I would be more than happy for workqueues to be happy to queue and
execute a handler for a never-been-online CPU.  Whatever works!  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
  2026-04-09 17:47     ` Tejun Heo
  2026-04-09 17:48       ` Tejun Heo
@ 2026-04-09 18:10       ` Boqun Feng
  2026-04-09 18:27         ` Paul E. McKenney
  2026-04-10 18:53         ` Tejun Heo
  1 sibling, 2 replies; 32+ messages in thread
From: Boqun Feng @ 2026-04-09 18:10 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Vasily Gorbik, Paul E. McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Joel Fernandes, Uladzislau Rezki, rcu,
	linux-kernel, linux-s390, Lai Jiangshan

On Thu, Apr 09, 2026 at 07:47:09AM -1000, Tejun Heo wrote:
> On Thu, Apr 09, 2026 at 10:40:05AM -0700, Boqun Feng wrote:
> > On Thu, Apr 09, 2026 at 10:26:49AM -0700, Boqun Feng wrote:
> > > On Thu, Apr 09, 2026 at 03:08:45PM +0200, Vasily Gorbik wrote:
> > > > Commit 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
> > > > non-preemptible") defers srcu_node tree allocation when called under
> > > > raw spinlock, putting SRCU through ~6 transitional grace periods
> > > > (SRCU_SIZE_ALLOC to SRCU_SIZE_BIG). During this transition srcu_gp_end()
> > > > uses mask = ~0, which makes srcu_schedule_cbs_snp() call queue_work_on()
> > > > for every possible CPU. Since rcu_gp_wq is WQ_PERCPU, work targets
> > > > per-CPU pools directly - pools for not-online CPUs have no workers,
> > > 
> > > [Cc workqueue]
> > > 
> > > Hmm.. I thought for offline CPUs the corresponding worker pools become a
> > > unbound one hence there are still workers?
> > > 
> > 
> > Ah, as Paul replied in another email, the problem was because these CPUs
> > had never been onlined, so they don't even have unbound workers?
> 
> Hahaha, we do initialize worker pool for every possible CPU but the
> transition to unbound operation happens in the hot unplug callback. We

;-) ;-) ;-)

> probably need to do some of the hot unplug operation during init if the CPU

Seems that we (mostly Paul) have our own trick to track whether a CPU
has ever been onlined in RCU, see rcu_cpu_beenfullyonline(). Paul also
used it in his fix [1]. And I think it won't be that hard to copy it
into workqueue and let queue_work_on() use it so that if the user queues
a work on a never-onlined CPU, it can detect it (with a warning?) and do
something?

[1]: https://lore.kernel.org/rcu/073abb55-197a-4519-b177-f9f776624fed@paulmck-laptop/

Regards,
Boqun

> is possible but not online. That said, what kind of machine is it? Is the
> firmware just reporting bogus possible mask? How come the CPUs weren't
> online during boot?
> 
> Thanks.
> 
> -- 
> tejun

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
  2026-04-09 18:10       ` Boqun Feng
@ 2026-04-09 18:27         ` Paul E. McKenney
  2026-04-10 18:53         ` Tejun Heo
  1 sibling, 0 replies; 32+ messages in thread
From: Paul E. McKenney @ 2026-04-09 18:27 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Tejun Heo, Vasily Gorbik, Frederic Weisbecker, Neeraj Upadhyay,
	Joel Fernandes, Uladzislau Rezki, rcu, linux-kernel, linux-s390,
	Lai Jiangshan

On Thu, Apr 09, 2026 at 11:10:04AM -0700, Boqun Feng wrote:
> On Thu, Apr 09, 2026 at 07:47:09AM -1000, Tejun Heo wrote:
> > On Thu, Apr 09, 2026 at 10:40:05AM -0700, Boqun Feng wrote:
> > > On Thu, Apr 09, 2026 at 10:26:49AM -0700, Boqun Feng wrote:
> > > > On Thu, Apr 09, 2026 at 03:08:45PM +0200, Vasily Gorbik wrote:
> > > > > Commit 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
> > > > > non-preemptible") defers srcu_node tree allocation when called under
> > > > > raw spinlock, putting SRCU through ~6 transitional grace periods
> > > > > (SRCU_SIZE_ALLOC to SRCU_SIZE_BIG). During this transition srcu_gp_end()
> > > > > uses mask = ~0, which makes srcu_schedule_cbs_snp() call queue_work_on()
> > > > > for every possible CPU. Since rcu_gp_wq is WQ_PERCPU, work targets
> > > > > per-CPU pools directly - pools for not-online CPUs have no workers,
> > > > 
> > > > [Cc workqueue]
> > > > 
> > > > Hmm.. I thought for offline CPUs the corresponding worker pools become a
> > > > unbound one hence there are still workers?
> > > > 
> > > 
> > > Ah, as Paul replied in another email, the problem was because these CPUs
> > > had never been onlined, so they don't even have unbound workers?
> > 
> > Hahaha, we do initialize worker pool for every possible CPU but the
> > transition to unbound operation happens in the hot unplug callback. We
> 
> ;-) ;-) ;-)
> 
> > probably need to do some of the hot unplug operation during init if the CPU
> 
> Seems that we (mostly Paul) have our own trick to track whether a CPU
> has ever been onlined in RCU, see rcu_cpu_beenfullyonline(). Paul also
> used it in his fix [1]. And I think it won't be that hard to copy it
> into workqueue and let queue_work_on() use it so that if the user queues
> a work on a never-onlined CPU, it can detect it (with a warning?) and do
> something?
> 
> [1]: https://lore.kernel.org/rcu/073abb55-197a-4519-b177-f9f776624fed@paulmck-laptop/

It might be that my patch (or something like it) will be required in
addition to Tejun's fix because the current Tree SRCU code is happy
to schedule a workqueue handler on a CPU that does not even have a bit
set in the cpu_possible_mask.  This could happen on a system with the
first 50 CPUs, as in 0-49, in cpu_possible_mask.  Tree SRCU would then
be quite happy to schedule workqueue handlers on the mythical CPUs 50-63.
Which, now that I think on it, does seem a bit more brave than absolutely
warranted.  ;-)

							Thanx, Paul

> Regards,
> Boqun
> 
> > is possible but not online. That said, what kind of machine is it? Is the
> > firmware just reporting bogus possible mask? How come the CPUs weren't
> > online during boot?
> > 
> > Thanks.
> > 
> > -- 
> > tejun

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
  2026-04-09 18:10       ` Boqun Feng
  2026-04-09 18:27         ` Paul E. McKenney
@ 2026-04-10 18:53         ` Tejun Heo
  2026-04-10 19:17           ` Paul E. McKenney
                             ` (2 more replies)
  1 sibling, 3 replies; 32+ messages in thread
From: Tejun Heo @ 2026-04-10 18:53 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Vasily Gorbik, Paul E. McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Joel Fernandes, Uladzislau Rezki, rcu,
	linux-kernel, linux-s390, Lai Jiangshan

Hello,

On Thu, Apr 09, 2026 at 11:10:04AM -0700, Boqun Feng wrote:
> On Thu, Apr 09, 2026 at 07:47:09AM -1000, Tejun Heo wrote:
> > On Thu, Apr 09, 2026 at 10:40:05AM -0700, Boqun Feng wrote:
> > > On Thu, Apr 09, 2026 at 10:26:49AM -0700, Boqun Feng wrote:
> > > > On Thu, Apr 09, 2026 at 03:08:45PM +0200, Vasily Gorbik wrote:
> > > > > Commit 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
> > > > > non-preemptible") defers srcu_node tree allocation when called under
> > > > > raw spinlock, putting SRCU through ~6 transitional grace periods
> > > > > (SRCU_SIZE_ALLOC to SRCU_SIZE_BIG). During this transition srcu_gp_end()
> > > > > uses mask = ~0, which makes srcu_schedule_cbs_snp() call queue_work_on()
> > > > > for every possible CPU. Since rcu_gp_wq is WQ_PERCPU, work targets
> > > > > per-CPU pools directly - pools for not-online CPUs have no workers,
> > > > 
> > > > [Cc workqueue]
> > > > 
> > > > Hmm.. I thought for offline CPUs the corresponding worker pools become a
> > > > unbound one hence there are still workers?
> > > > 
> > > 
> > > Ah, as Paul replied in another email, the problem was because these CPUs
> > > had never been onlined, so they don't even have unbound workers?
> > 
> > Hahaha, we do initialize worker pool for every possible CPU but the
> > transition to unbound operation happens in the hot unplug callback. We
> 
> ;-) ;-) ;-)
> 
> > probably need to do some of the hot unplug operation during init if the CPU
> 
> Seems that we (mostly Paul) have our own trick to track whether a CPU
> has ever been onlined in RCU, see rcu_cpu_beenfullyonline(). Paul also
> used it in his fix [1]. And I think it won't be that hard to copy it
> into workqueue and let queue_work_on() use it so that if the user queues
> a work on a never-onlined CPU, it can detect it (with a warning?) and do
> something?

The easiest way to do this is just creating the initial workers for all
possible pools. Please see below. However, the downside is that it's going
to create all workers for all possible cpus. This isn't a problem for
anybody else but these IBM mainframes often come up with a lot of possible
but not-yet-or-ever-online CPUs for capacity management, so the cost may not
be negligible on some configurations.

IBM folks, is that okay?

Also, why do you need to queue work items on an offline CPU? Do they
actually have to be per-cpu? Can you get away with using an unbound
workqueue?

Thanks.

From: Tejun Heo <tj@kernel.org>
Subject: workqueue: Create workers for all possible CPUs on init

Per-CPU worker pools are initialized for every possible CPU during early boot,
but workqueue_init() only creates initial workers for online CPUs. On systems
where possible CPUs outnumber online CPUs (e.g. s390 LPARs with 76 online and
400 possible CPUs), the pools for never-onlined CPUs have POOL_DISASSOCIATED
set but no workers. Any work item queued on such a CPU hangs indefinitely.

This was exposed by 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
non-preemptible") which made SRCU schedule callbacks on all possible CPUs
during size transitions, triggering workqueue lockup warnings for all
never-onlined CPUs.

Create workers for all possible CPUs during init, not just online ones. For
online CPUs, the behavior is unchanged - POOL_DISASSOCIATED is cleared and the
worker is bound to the CPU. For not-yet-online CPUs, POOL_DISASSOCIATED
remains set, so worker_attach_to_pool() marks the worker UNBOUND and it can
execute on any CPU. When the CPU later comes online, rebind_workers() handles
the transition to associated operation as usual.

Reported-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Boqun Feng <boqun@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/workqueue.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -8068,9 +8068,10 @@ void __init workqueue_init(void)
 		for_each_bh_worker_pool(pool, cpu)
 			BUG_ON(!create_worker(pool));

-	for_each_online_cpu(cpu) {
+	for_each_possible_cpu(cpu) {
 		for_each_cpu_worker_pool(pool, cpu) {
-			pool->flags &= ~POOL_DISASSOCIATED;
+			if (cpu_online(cpu))
+				pool->flags &= ~POOL_DISASSOCIATED;
 			BUG_ON(!create_worker(pool));
 		}
 	}
-- 
tejun

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
  2026-04-10 18:53         ` Tejun Heo
@ 2026-04-10 19:17           ` Paul E. McKenney
  2026-04-10 19:29             ` Tejun Heo
  2026-04-29 15:00           ` Srikar Dronamraju
  2026-04-29 18:17           ` Samir M
  2 siblings, 1 reply; 32+ messages in thread
From: Paul E. McKenney @ 2026-04-10 19:17 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Boqun Feng, Vasily Gorbik, Frederic Weisbecker, Neeraj Upadhyay,
	Joel Fernandes, Uladzislau Rezki, rcu, linux-kernel, linux-s390,
	Lai Jiangshan

On Fri, Apr 10, 2026 at 08:53:30AM -1000, Tejun Heo wrote:
> Hello,
> 
> On Thu, Apr 09, 2026 at 11:10:04AM -0700, Boqun Feng wrote:
> > On Thu, Apr 09, 2026 at 07:47:09AM -1000, Tejun Heo wrote:
> > > On Thu, Apr 09, 2026 at 10:40:05AM -0700, Boqun Feng wrote:
> > > > On Thu, Apr 09, 2026 at 10:26:49AM -0700, Boqun Feng wrote:
> > > > > On Thu, Apr 09, 2026 at 03:08:45PM +0200, Vasily Gorbik wrote:
> > > > > > Commit 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
> > > > > > non-preemptible") defers srcu_node tree allocation when called under
> > > > > > raw spinlock, putting SRCU through ~6 transitional grace periods
> > > > > > (SRCU_SIZE_ALLOC to SRCU_SIZE_BIG). During this transition srcu_gp_end()
> > > > > > uses mask = ~0, which makes srcu_schedule_cbs_snp() call queue_work_on()
> > > > > > for every possible CPU. Since rcu_gp_wq is WQ_PERCPU, work targets
> > > > > > per-CPU pools directly - pools for not-online CPUs have no workers,
> > > > > 
> > > > > [Cc workqueue]
> > > > > 
> > > > > Hmm.. I thought for offline CPUs the corresponding worker pools become a
> > > > > unbound one hence there are still workers?
> > > > > 
> > > > 
> > > > Ah, as Paul replied in another email, the problem was because these CPUs
> > > > had never been onlined, so they don't even have unbound workers?
> > > 
> > > Hahaha, we do initialize worker pool for every possible CPU but the
> > > transition to unbound operation happens in the hot unplug callback. We
> > 
> > ;-) ;-) ;-)
> > 
> > > probably need to do some of the hot unplug operation during init if the CPU
> > 
> > Seems that we (mostly Paul) have our own trick to track whether a CPU
> > has ever been onlined in RCU, see rcu_cpu_beenfullyonline(). Paul also
> > used it in his fix [1]. And I think it won't be that hard to copy it
> > into workqueue and let queue_work_on() use it so that if the user queues
> > a work on a never-onlined CPU, it can detect it (with a warning?) and do
> > something?
> 
> The easiest way to do this is just creating the initial workers for all
> possible pools. Please see below. However, the downside is that it's going
> to create all workers for all possible cpus. This isn't a problem for
> anybody else but these IBM mainframes often come up with a lot of possible
> but not-yet-or-ever-online CPUs for capacity management, so the cost may not
> be negligible on some configurations.
> 
> IBM folks, is that okay?

I have also seen x86 systems whose firmware claimed very large numbers
of CPUs.  :-(

> Also, why do you need to queue work items on an offline CPU? Do they
> actually have to be per-cpu? Can you get away with using an unbound
> workqueue?

It is good for them to run on the specified CPU in the common case for
cache-locality reasons, but if they were occasionally redirected to some
other CPU, that would be just fine.

I am also keeping the patch that avoids queueing work to CPUs that are not
yet fully online.  Further adjustments will be needed if someone invokes
call_srcu(), synchronize_srcu(), or synchronize_srcu_expedited() from an
CPU that is not yet fully online.  Past experience of course suggests that
this will be happen, and that there will be a good reason for it.  ;-)

							Thanx, Paul

> Thanks.
> 
> From: Tejun Heo <tj@kernel.org>
> Subject: workqueue: Create workers for all possible CPUs on init
> 
> Per-CPU worker pools are initialized for every possible CPU during early boot,
> but workqueue_init() only creates initial workers for online CPUs. On systems
> where possible CPUs outnumber online CPUs (e.g. s390 LPARs with 76 online and
> 400 possible CPUs), the pools for never-onlined CPUs have POOL_DISASSOCIATED
> set but no workers. Any work item queued on such a CPU hangs indefinitely.
> 
> This was exposed by 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
> non-preemptible") which made SRCU schedule callbacks on all possible CPUs
> during size transitions, triggering workqueue lockup warnings for all
> never-onlined CPUs.
> 
> Create workers for all possible CPUs during init, not just online ones. For
> online CPUs, the behavior is unchanged - POOL_DISASSOCIATED is cleared and the
> worker is bound to the CPU. For not-yet-online CPUs, POOL_DISASSOCIATED
> remains set, so worker_attach_to_pool() marks the worker UNBOUND and it can
> execute on any CPU. When the CPU later comes online, rebind_workers() handles
> the transition to associated operation as usual.
> 
> Reported-by: Vasily Gorbik <gor@linux.ibm.com>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Boqun Feng <boqun@kernel.org>
> Cc: Paul E. McKenney <paulmck@kernel.org>
> ---
>  kernel/workqueue.c |    5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -8068,9 +8068,10 @@ void __init workqueue_init(void)
>  		for_each_bh_worker_pool(pool, cpu)
>  			BUG_ON(!create_worker(pool));
> 
> -	for_each_online_cpu(cpu) {
> +	for_each_possible_cpu(cpu) {
>  		for_each_cpu_worker_pool(pool, cpu) {
> -			pool->flags &= ~POOL_DISASSOCIATED;
> +			if (cpu_online(cpu))
> +				pool->flags &= ~POOL_DISASSOCIATED;
>  			BUG_ON(!create_worker(pool));
>  		}
>  	}
> -- 
> tejun

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
  2026-04-10 19:17           ` Paul E. McKenney
@ 2026-04-10 19:29             ` Tejun Heo
  0 siblings, 0 replies; 32+ messages in thread
From: Tejun Heo @ 2026-04-10 19:29 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Boqun Feng, Vasily Gorbik, Frederic Weisbecker, Neeraj Upadhyay,
	Joel Fernandes, Uladzislau Rezki, rcu, linux-kernel, linux-s390,
	Lai Jiangshan

Hello, Paul.

On Fri, Apr 10, 2026 at 12:17:21PM -0700, Paul E. McKenney wrote:
> > The easiest way to do this is just creating the initial workers for all
> > possible pools. Please see below. However, the downside is that it's going
> > to create all workers for all possible cpus. This isn't a problem for
> > anybody else but these IBM mainframes often come up with a lot of possible
> > but not-yet-or-ever-online CPUs for capacity management, so the cost may not
> > be negligible on some configurations.
> > 
> > IBM folks, is that okay?
> 
> I have also seen x86 systems whose firmware claimed very large numbers
> of CPUs.  :-(

Yeah, I remember seeing those but at least the ones I remember are from long
times ago. Hopefully, no bios is getting things that wrong anymore.

> > Also, why do you need to queue work items on an offline CPU? Do they
> > actually have to be per-cpu? Can you get away with using an unbound
> > workqueue?
> 
> It is good for them to run on the specified CPU in the common case for
> cache-locality reasons, but if they were occasionally redirected to some
> other CPU, that would be just fine.

I see.  

> I am also keeping the patch that avoids queueing work to CPUs that are not
> yet fully online.  Further adjustments will be needed if someone invokes
> call_srcu(), synchronize_srcu(), or synchronize_srcu_expedited() from an
> CPU that is not yet fully online.  Past experience of course suggests that
> this will be happen, and that there will be a good reason for it.  ;-)

I'm gonna hold for now. From workqueue side, it's a really easy change, so
please let me know if this comes up again.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
  2026-04-10 18:53         ` Tejun Heo
  2026-04-10 19:17           ` Paul E. McKenney
@ 2026-04-29 15:00           ` Srikar Dronamraju
  2026-04-29 17:08             ` Vasily Gorbik
  2026-04-29 18:17           ` Samir M
  2 siblings, 1 reply; 32+ messages in thread
From: Srikar Dronamraju @ 2026-04-29 15:00 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Boqun Feng, Vasily Gorbik, Paul E. McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Joel Fernandes, Uladzislau Rezki, rcu,
	linux-kernel, linux-s390, Lai Jiangshan, samir

* Tejun Heo <tj@kernel.org> [2026-04-10 08:53:30]:

Hi Tejun,

[ copying Samir Mulani to this thread ]

> Hello,
> 
> > Seems that we (mostly Paul) have our own trick to track whether a CPU
> > has ever been onlined in RCU, see rcu_cpu_beenfullyonline(). Paul also
> > used it in his fix [1]. And I think it won't be that hard to copy it
> > into workqueue and let queue_work_on() use it so that if the user queues
> > a work on a never-onlined CPU, it can detect it (with a warning?) and do
> > something?
> 
> The easiest way to do this is just creating the initial workers for all
> possible pools. Please see below. However, the downside is that it's going
> to create all workers for all possible cpus. This isn't a problem for
> anybody else but these IBM mainframes often come up with a lot of possible
> but not-yet-or-ever-online CPUs for capacity management, so the cost may not
> be negligible on some configurations.
> 
> IBM folks, is that okay?

Even on PowerPC LPARS, its not uncommon to have possible cpus != online cpus
at boot.  However your approach will work.

And Samir has already tested the same too and reported here
https://lkml.kernel.org/r/1b89c25b-7c1d-4ed8-adf3-ac504b6f086a@linux.ibm.com

> 
> Also, why do you need to queue work items on an offline CPU? Do they
> actually have to be per-cpu? Can you get away with using an unbound
> workqueue?
> 
> Thanks.
> 
> From: Tejun Heo <tj@kernel.org>
> Subject: workqueue: Create workers for all possible CPUs on init
> 
> Per-CPU worker pools are initialized for every possible CPU during early boot,
> but workqueue_init() only creates initial workers for online CPUs. On systems
> where possible CPUs outnumber online CPUs (e.g. s390 LPARs with 76 online and
> 400 possible CPUs), the pools for never-onlined CPUs have POOL_DISASSOCIATED
> set but no workers. Any work item queued on such a CPU hangs indefinitely.
> 
> This was exposed by 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
> non-preemptible") which made SRCU schedule callbacks on all possible CPUs
> during size transitions, triggering workqueue lockup warnings for all
> never-onlined CPUs.
> 
> Create workers for all possible CPUs during init, not just online ones. For
> online CPUs, the behavior is unchanged - POOL_DISASSOCIATED is cleared and the
> worker is bound to the CPU. For not-yet-online CPUs, POOL_DISASSOCIATED
> remains set, so worker_attach_to_pool() marks the worker UNBOUND and it can
> execute on any CPU. When the CPU later comes online, rebind_workers() handles
> the transition to associated operation as usual.
> 

With these patch, if a CPU has been onlined once, it's should be ok to queue
the work on that CPU even if its offline now.

> Reported-by: Vasily Gorbik <gor@linux.ibm.com>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Boqun Feng <boqun@kernel.org>
> Cc: Paul E. McKenney <paulmck@kernel.org>

Reviewed-by: Srikar Dronamraju <srikar@linux.ibm.com>

> ---
>  kernel/workqueue.c |    5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -8068,9 +8068,10 @@ void __init workqueue_init(void)
>  		for_each_bh_worker_pool(pool, cpu)
>  			BUG_ON(!create_worker(pool));
> 
> -	for_each_online_cpu(cpu) {
> +	for_each_possible_cpu(cpu) {
>  		for_each_cpu_worker_pool(pool, cpu) {
> -			pool->flags &= ~POOL_DISASSOCIATED;
> +			if (cpu_online(cpu))
> +				pool->flags &= ~POOL_DISASSOCIATED;
>  			BUG_ON(!create_worker(pool));
>  		}
>  	}
> -- 
> tejun

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
  2026-04-29 15:00           ` Srikar Dronamraju
@ 2026-04-29 17:08             ` Vasily Gorbik
  2026-04-29 17:18               ` Paul E. McKenney
  0 siblings, 1 reply; 32+ messages in thread
From: Vasily Gorbik @ 2026-04-29 17:08 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Tejun Heo, Boqun Feng, Paul E. McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Joel Fernandes, Uladzislau Rezki, rcu,
	linux-kernel, linux-s390, Lai Jiangshan, samir

On Wed, Apr 29, 2026 at 08:30:38PM +0530, Srikar Dronamraju wrote:
> * Tejun Heo <tj@kernel.org> [2026-04-10 08:53:30]:
> > Hello,
> > 
> > > Seems that we (mostly Paul) have our own trick to track whether a CPU
> > > has ever been onlined in RCU, see rcu_cpu_beenfullyonline(). Paul also
> > > used it in his fix [1]. And I think it won't be that hard to copy it
> > > into workqueue and let queue_work_on() use it so that if the user queues
> > > a work on a never-onlined CPU, it can detect it (with a warning?) and do
> > > something?
> > 
> > The easiest way to do this is just creating the initial workers for all
> > possible pools. Please see below. However, the downside is that it's going
> > to create all workers for all possible cpus. This isn't a problem for
> > anybody else but these IBM mainframes often come up with a lot of possible
> > but not-yet-or-ever-online CPUs for capacity management, so the cost may not
> > be negligible on some configurations.
> > 
> > IBM folks, is that okay?
> 
> Even on PowerPC LPARS, its not uncommon to have possible cpus != online cpus
> at boot.  However your approach will work.
> 
> And Samir has already tested the same too and reported here
> https://lkml.kernel.org/r/1b89c25b-7c1d-4ed8-adf3-ac504b6f086a@linux.ibm.com
> 
> > From: Tejun Heo <tj@kernel.org>
> > Subject: workqueue: Create workers for all possible CPUs on init
> > 
> > Per-CPU worker pools are initialized for every possible CPU during early boot,
> > but workqueue_init() only creates initial workers for online CPUs. On systems
> > where possible CPUs outnumber online CPUs (e.g. s390 LPARs with 76 online and
> > 400 possible CPUs), the pools for never-onlined CPUs have POOL_DISASSOCIATED
> > set but no workers. Any work item queued on such a CPU hangs indefinitely.
> > 
> > This was exposed by 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
> > non-preemptible") which made SRCU schedule callbacks on all possible CPUs
> > during size transitions, triggering workqueue lockup warnings for all
> > never-onlined CPUs.
> > 
> > Create workers for all possible CPUs during init, not just online ones. For
> > online CPUs, the behavior is unchanged - POOL_DISASSOCIATED is cleared and the
> > worker is bound to the CPU. For not-yet-online CPUs, POOL_DISASSOCIATED
> > remains set, so worker_attach_to_pool() marks the worker UNBOUND and it can
> > execute on any CPU. When the CPU later comes online, rebind_workers() handles
> > the transition to associated operation as usual.
> > 
> 
> With these patch, if a CPU has been onlined once, it's should be ok to queue
> the work on that CPU even if its offline now.

That already seems to hold without this patch, what this patch newly
covers is queueing on CPUs that have never been online.

Do we actually need to create workers for every possible CPU at boot?
On the s390 LPAR in question (76 online / 400 possible) that's a few
hundred extra kthreads kept around for the life of the system.
That's probably the same on PowerPC.

Wouldn't Paul's SRCU-side fix [1] alone be enough here for PowerPC
as well? I retested it on s390 (76/400) and on x86 KVM with
--smp 16,maxcpus=255 and the lockup didn't reproduce in either case.

[1] https://lore.kernel.org/rcu/ed1fa6cd-7343-4ca3-8b9d-d699ca496f83@paulmck-laptop/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
  2026-04-29 17:08             ` Vasily Gorbik
@ 2026-04-29 17:18               ` Paul E. McKenney
  2026-04-29 17:44                 ` Shrikanth Hegde
  0 siblings, 1 reply; 32+ messages in thread
From: Paul E. McKenney @ 2026-04-29 17:18 UTC (permalink / raw)
  To: Vasily Gorbik
  Cc: Srikar Dronamraju, Tejun Heo, Boqun Feng, Frederic Weisbecker,
	Neeraj Upadhyay, Joel Fernandes, Uladzislau Rezki, rcu,
	linux-kernel, linux-s390, Lai Jiangshan, samir

On Wed, Apr 29, 2026 at 07:08:23PM +0200, Vasily Gorbik wrote:
> On Wed, Apr 29, 2026 at 08:30:38PM +0530, Srikar Dronamraju wrote:
> > * Tejun Heo <tj@kernel.org> [2026-04-10 08:53:30]:
> > > Hello,
> > > 
> > > > Seems that we (mostly Paul) have our own trick to track whether a CPU
> > > > has ever been onlined in RCU, see rcu_cpu_beenfullyonline(). Paul also
> > > > used it in his fix [1]. And I think it won't be that hard to copy it
> > > > into workqueue and let queue_work_on() use it so that if the user queues
> > > > a work on a never-onlined CPU, it can detect it (with a warning?) and do
> > > > something?
> > > 
> > > The easiest way to do this is just creating the initial workers for all
> > > possible pools. Please see below. However, the downside is that it's going
> > > to create all workers for all possible cpus. This isn't a problem for
> > > anybody else but these IBM mainframes often come up with a lot of possible
> > > but not-yet-or-ever-online CPUs for capacity management, so the cost may not
> > > be negligible on some configurations.
> > > 
> > > IBM folks, is that okay?
> > 
> > Even on PowerPC LPARS, its not uncommon to have possible cpus != online cpus
> > at boot.  However your approach will work.
> > 
> > And Samir has already tested the same too and reported here
> > https://lkml.kernel.org/r/1b89c25b-7c1d-4ed8-adf3-ac504b6f086a@linux.ibm.com
> > 
> > > From: Tejun Heo <tj@kernel.org>
> > > Subject: workqueue: Create workers for all possible CPUs on init
> > > 
> > > Per-CPU worker pools are initialized for every possible CPU during early boot,
> > > but workqueue_init() only creates initial workers for online CPUs. On systems
> > > where possible CPUs outnumber online CPUs (e.g. s390 LPARs with 76 online and
> > > 400 possible CPUs), the pools for never-onlined CPUs have POOL_DISASSOCIATED
> > > set but no workers. Any work item queued on such a CPU hangs indefinitely.
> > > 
> > > This was exposed by 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
> > > non-preemptible") which made SRCU schedule callbacks on all possible CPUs
> > > during size transitions, triggering workqueue lockup warnings for all
> > > never-onlined CPUs.
> > > 
> > > Create workers for all possible CPUs during init, not just online ones. For
> > > online CPUs, the behavior is unchanged - POOL_DISASSOCIATED is cleared and the
> > > worker is bound to the CPU. For not-yet-online CPUs, POOL_DISASSOCIATED
> > > remains set, so worker_attach_to_pool() marks the worker UNBOUND and it can
> > > execute on any CPU. When the CPU later comes online, rebind_workers() handles
> > > the transition to associated operation as usual.
> > > 
> > 
> > With these patch, if a CPU has been onlined once, it's should be ok to queue
> > the work on that CPU even if its offline now.
> 
> That already seems to hold without this patch, what this patch newly
> covers is queueing on CPUs that have never been online.
> 
> Do we actually need to create workers for every possible CPU at boot?
> On the s390 LPAR in question (76 online / 400 possible) that's a few
> hundred extra kthreads kept around for the life of the system.
> That's probably the same on PowerPC.
> 
> Wouldn't Paul's SRCU-side fix [1] alone be enough here for PowerPC
> as well? I retested it on s390 (76/400) and on x86 KVM with
> --smp 16,maxcpus=255 and the lockup didn't reproduce in either case.
> 
> [1] https://lore.kernel.org/rcu/ed1fa6cd-7343-4ca3-8b9d-d699ca496f83@paulmck-laptop/

Just to emphasize that SRCU really was buggy before my fix.  The
queue_work_on() kernel-doc header clearly states the rules.  The bug
is even more embarrassing given just who it was that wrote those two
sentences.  ;-)

							Thanx, Paul

/**
 * queue_work_on - queue work on specific cpu
 * @cpu: CPU number to execute work on
 * @wq: workqueue to use
 * @work: work to queue
 *
 * We queue the work to a specific CPU, the caller must ensure it
 * can't go away.  Callers that fail to ensure that the specified
 * CPU cannot go away will execute on a randomly chosen CPU.
 * But note well that callers specifying a CPU that never has been
 * online will get a splat.
 *
 * Return: %false if @work was already on a queue, %true otherwise.
 */

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
  2026-04-29 17:18               ` Paul E. McKenney
@ 2026-04-29 17:44                 ` Shrikanth Hegde
  2026-04-29 18:01                   ` Paul E. McKenney
  0 siblings, 1 reply; 32+ messages in thread
From: Shrikanth Hegde @ 2026-04-29 17:44 UTC (permalink / raw)
  To: paulmck, Tejun Heo, Vasily Gorbik
  Cc: Srikar Dronamraju, Boqun Feng, Frederic Weisbecker,
	Neeraj Upadhyay, Joel Fernandes, Uladzislau Rezki, rcu,
	linux-kernel, linux-s390, Lai Jiangshan, samir


I have limited understanding in rcu or workqueues, but my two cents.

On 4/29/26 10:48 PM, Paul E. McKenney wrote:
> On Wed, Apr 29, 2026 at 07:08:23PM +0200, Vasily Gorbik wrote:
>> On Wed, Apr 29, 2026 at 08:30:38PM +0530, Srikar Dronamraju wrote:
>>> * Tejun Heo <tj@kernel.org> [2026-04-10 08:53:30]:
>>>> Hello,
>>>>
>>>>> Seems that we (mostly Paul) have our own trick to track whether a CPU
>>>>> has ever been onlined in RCU, see rcu_cpu_beenfullyonline(). Paul also
>>>>> used it in his fix [1]. And I think it won't be that hard to copy it
>>>>> into workqueue and let queue_work_on() use it so that if the user queues
>>>>> a work on a never-onlined CPU, it can detect it (with a warning?) and do
>>>>> something?
>>>>
>>>> The easiest way to do this is just creating the initial workers for all
>>>> possible pools. Please see below. However, the downside is that it's going
>>>> to create all workers for all possible cpus. This isn't a problem for
>>>> anybody else but these IBM mainframes often come up with a lot of possible
>>>> but not-yet-or-ever-online CPUs for capacity management, so the cost may not
>>>> be negligible on some configurations.
>>>>
>>>> IBM folks, is that okay?
>>>
>>> Even on PowerPC LPARS, its not uncommon to have possible cpus != online cpus
>>> at boot.  However your approach will work.
>>>
>>> And Samir has already tested the same too and reported here
>>> https://lkml.kernel.org/r/1b89c25b-7c1d-4ed8-adf3-ac504b6f086a@linux.ibm.com
>>>
>>>> From: Tejun Heo <tj@kernel.org>
>>>> Subject: workqueue: Create workers for all possible CPUs on init
>>>>
>>>> Per-CPU worker pools are initialized for every possible CPU during early boot,
>>>> but workqueue_init() only creates initial workers for online CPUs. On systems
>>>> where possible CPUs outnumber online CPUs (e.g. s390 LPARs with 76 online and
>>>> 400 possible CPUs), the pools for never-onlined CPUs have POOL_DISASSOCIATED
>>>> set but no workers. Any work item queued on such a CPU hangs indefinitely.
>>>>
>>>> This was exposed by 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
>>>> non-preemptible") which made SRCU schedule callbacks on all possible CPUs
>>>> during size transitions, triggering workqueue lockup warnings for all
>>>> never-onlined CPUs.
>>>>
>>>> Create workers for all possible CPUs during init, not just online ones. For
>>>> online CPUs, the behavior is unchanged - POOL_DISASSOCIATED is cleared and the
>>>> worker is bound to the CPU. For not-yet-online CPUs, POOL_DISASSOCIATED
>>>> remains set, so worker_attach_to_pool() marks the worker UNBOUND and it can
>>>> execute on any CPU. When the CPU later comes online, rebind_workers() handles
>>>> the transition to associated operation as usual.
>>>>
>>>
>>> With these patch, if a CPU has been onlined once, it's should be ok to queue
>>> the work on that CPU even if its offline now.
>>
>> That already seems to hold without this patch, what this patch newly
>> covers is queueing on CPUs that have never been online.
>>
>> Do we actually need to create workers for every possible CPU at boot?
>> On the s390 LPAR in question (76 online / 400 possible) that's a few
>> hundred extra kthreads kept around for the life of the system.
>> That's probably the same on PowerPC.
>>
>> Wouldn't Paul's SRCU-side fix [1] alone be enough here for PowerPC
>> as well? I retested it on s390 (76/400) and on x86 KVM with
>> --smp 16,maxcpus=255 and the lockup didn't reproduce in either case.
>>
>> [1] https://lore.kernel.org/rcu/ed1fa6cd-7343-4ca3-8b9d-d699ca496f83@paulmck-laptop/
> 
> Just to emphasize that SRCU really was buggy before my fix.  The
> queue_work_on() kernel-doc header clearly states the rules.  The bug
> is even more embarrassing given just who it was that wrote those two
> sentences.  ;-)
> 

That mask = ~0 is really looks uncomfortable to me. What does it mean?
It might end up even sending to non possible CPUs without proper checks.

It should use either cpumask_setall? or use cpu_online_mask?

Your current patch rcu_cpu_beenfullyonline indicates that code around
srcu_schedule_cbs_sdp handles hotplug already right?
in that case, just setting mask = cpu_online_mask would work?


> 							Thanx, Paul
> 
> /**
>   * queue_work_on - queue work on specific cpu
>   * @cpu: CPU number to execute work on
>   * @wq: workqueue to use
>   * @work: work to queue
>   *
>   * We queue the work to a specific CPU, the caller must ensure it
>   * can't go away.  Callers that fail to ensure that the specified
>   * CPU cannot go away will execute on a randomly chosen CPU.
>   * But note well that callers specifying a CPU that never has been
>   * online will get a splat.
>   *
>   * Return: %false if @work was already on a queue, %true otherwise.
>   */


In that case, making offline CPUs have a unbound workqueue is wrong. no?

It might encourage more users to abuse queue_work_on interface to
send to offline CPUs without any checks and onus now falls onto
workqueue to disaptch to unbound wqs.

So I think it is better to put the guardrails in SRCU instead of any change in
workqueue.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
  2026-04-29 17:44                 ` Shrikanth Hegde
@ 2026-04-29 18:01                   ` Paul E. McKenney
  2026-04-30  7:08                     ` Shrikanth Hegde
  0 siblings, 1 reply; 32+ messages in thread
From: Paul E. McKenney @ 2026-04-29 18:01 UTC (permalink / raw)
  To: Shrikanth Hegde
  Cc: Tejun Heo, Vasily Gorbik, Srikar Dronamraju, Boqun Feng,
	Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Uladzislau Rezki, rcu, linux-kernel, linux-s390, Lai Jiangshan,
	samir

On Wed, Apr 29, 2026 at 11:14:56PM +0530, Shrikanth Hegde wrote:
> 
> I have limited understanding in rcu or workqueues, but my two cents.
> 
> On 4/29/26 10:48 PM, Paul E. McKenney wrote:
> > On Wed, Apr 29, 2026 at 07:08:23PM +0200, Vasily Gorbik wrote:
> > > On Wed, Apr 29, 2026 at 08:30:38PM +0530, Srikar Dronamraju wrote:
> > > > * Tejun Heo <tj@kernel.org> [2026-04-10 08:53:30]:
> > > > > Hello,
> > > > > 
> > > > > > Seems that we (mostly Paul) have our own trick to track whether a CPU
> > > > > > has ever been onlined in RCU, see rcu_cpu_beenfullyonline(). Paul also
> > > > > > used it in his fix [1]. And I think it won't be that hard to copy it
> > > > > > into workqueue and let queue_work_on() use it so that if the user queues
> > > > > > a work on a never-onlined CPU, it can detect it (with a warning?) and do
> > > > > > something?
> > > > > 
> > > > > The easiest way to do this is just creating the initial workers for all
> > > > > possible pools. Please see below. However, the downside is that it's going
> > > > > to create all workers for all possible cpus. This isn't a problem for
> > > > > anybody else but these IBM mainframes often come up with a lot of possible
> > > > > but not-yet-or-ever-online CPUs for capacity management, so the cost may not
> > > > > be negligible on some configurations.
> > > > > 
> > > > > IBM folks, is that okay?
> > > > 
> > > > Even on PowerPC LPARS, its not uncommon to have possible cpus != online cpus
> > > > at boot.  However your approach will work.
> > > > 
> > > > And Samir has already tested the same too and reported here
> > > > https://lkml.kernel.org/r/1b89c25b-7c1d-4ed8-adf3-ac504b6f086a@linux.ibm.com
> > > > 
> > > > > From: Tejun Heo <tj@kernel.org>
> > > > > Subject: workqueue: Create workers for all possible CPUs on init
> > > > > 
> > > > > Per-CPU worker pools are initialized for every possible CPU during early boot,
> > > > > but workqueue_init() only creates initial workers for online CPUs. On systems
> > > > > where possible CPUs outnumber online CPUs (e.g. s390 LPARs with 76 online and
> > > > > 400 possible CPUs), the pools for never-onlined CPUs have POOL_DISASSOCIATED
> > > > > set but no workers. Any work item queued on such a CPU hangs indefinitely.
> > > > > 
> > > > > This was exposed by 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
> > > > > non-preemptible") which made SRCU schedule callbacks on all possible CPUs
> > > > > during size transitions, triggering workqueue lockup warnings for all
> > > > > never-onlined CPUs.
> > > > > 
> > > > > Create workers for all possible CPUs during init, not just online ones. For
> > > > > online CPUs, the behavior is unchanged - POOL_DISASSOCIATED is cleared and the
> > > > > worker is bound to the CPU. For not-yet-online CPUs, POOL_DISASSOCIATED
> > > > > remains set, so worker_attach_to_pool() marks the worker UNBOUND and it can
> > > > > execute on any CPU. When the CPU later comes online, rebind_workers() handles
> > > > > the transition to associated operation as usual.
> > > > > 
> > > > 
> > > > With these patch, if a CPU has been onlined once, it's should be ok to queue
> > > > the work on that CPU even if its offline now.
> > > 
> > > That already seems to hold without this patch, what this patch newly
> > > covers is queueing on CPUs that have never been online.
> > > 
> > > Do we actually need to create workers for every possible CPU at boot?
> > > On the s390 LPAR in question (76 online / 400 possible) that's a few
> > > hundred extra kthreads kept around for the life of the system.
> > > That's probably the same on PowerPC.
> > > 
> > > Wouldn't Paul's SRCU-side fix [1] alone be enough here for PowerPC
> > > as well? I retested it on s390 (76/400) and on x86 KVM with
> > > --smp 16,maxcpus=255 and the lockup didn't reproduce in either case.
> > > 
> > > [1] https://lore.kernel.org/rcu/ed1fa6cd-7343-4ca3-8b9d-d699ca496f83@paulmck-laptop/
> > 
> > Just to emphasize that SRCU really was buggy before my fix.  The
> > queue_work_on() kernel-doc header clearly states the rules.  The bug
> > is even more embarrassing given just who it was that wrote those two
> > sentences.  ;-)
> 
> That mask = ~0 is really looks uncomfortable to me. What does it mean?
> It might end up even sending to non possible CPUs without proper checks.
> 
> It should use either cpumask_setall? or use cpu_online_mask?
> 
> Your current patch rcu_cpu_beenfullyonline indicates that code around
> srcu_schedule_cbs_sdp handles hotplug already right?
> in that case, just setting mask = cpu_online_mask would work?

Agreed.  Which is why I have this commit queued:

f8d5aaaf90f8 ("srcu: Don't queue workqueue handlers to never-online CPUs")

This is currently slated for the upcoming merge window, but if you
need it sooner, please let us know.  Please see the end of this email
for the full commit.


							Thanx, Paul

> > /**
> >   * queue_work_on - queue work on specific cpu
> >   * @cpu: CPU number to execute work on
> >   * @wq: workqueue to use
> >   * @work: work to queue
> >   *
> >   * We queue the work to a specific CPU, the caller must ensure it
> >   * can't go away.  Callers that fail to ensure that the specified
> >   * CPU cannot go away will execute on a randomly chosen CPU.
> >   * But note well that callers specifying a CPU that never has been
> >   * online will get a splat.
> >   *
> >   * Return: %false if @work was already on a queue, %true otherwise.
> >   */
> 
> 
> In that case, making offline CPUs have a unbound workqueue is wrong. no?
> 
> It might encourage more users to abuse queue_work_on interface to
> send to offline CPUs without any checks and onus now falls onto
> workqueue to disaptch to unbound wqs.
> 
> So I think it is better to put the guardrails in SRCU instead of any change in
> workqueue.

------------------------------------------------------------------------

commit f8d5aaaf90f8294890802ce8dccbafd9850ac5f9
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Thu Apr 9 11:16:02 2026 -0700

    srcu: Don't queue workqueue handlers to never-online CPUs
    
    While an srcu_struct structure is in the midst of switching from CPU-0
    to all-CPUs state, it can attempt to invoke callbacks for CPUs that
    have never been online.  Worse yet, it can attempt in invoke callbacks
    for CPUs that never will be online due to not being present in the
    cpu_possible_mask.  This can cause hangs on s390, which is not set up to
    deal with workqueue handlers being scheduled on such CPUs.  This commit
    therefore causes Tree SRCU to refrain from queueing workqueue handlers
    on CPUs that have not yet (and might never) come online.
    
    Because callbacks are not invoked on CPUs that have not been
    online, it is an error to invoke call_srcu(), synchronize_srcu(), or
    synchronize_srcu_expedited() on a CPU that is not yet fully online.
    However, it turns out to be less code to redirect the callbacks
    from too-early invocations of call_srcu() than to warn about such
    invocations.  This commit therefore also redirects callbacks queued on
    not-yet-fully-online CPUs to the boot CPU.
    
    Reported-by: Vasily Gorbik <gor@linux.ibm.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Tested-by: Vasily Gorbik <gor@linux.ibm.com>
    Cc: Tejun Heo <tj@kernel.org>

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 0d01cd8c4b4a7b..7c2f7cc131f7ae 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -897,11 +897,9 @@ static void srcu_schedule_cbs_snp(struct srcu_struct *ssp, struct srcu_node *snp
 {
 	int cpu;
 
-	for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
-		if (!(mask & (1UL << (cpu - snp->grplo))))
-			continue;
-		srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
-	}
+	for (cpu = snp->grplo; cpu <= snp->grphi; cpu++)
+		if ((mask & (1UL << (cpu - snp->grplo))) && rcu_cpu_beenfullyonline(cpu))
+			srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
 }
 
 /*
@@ -1322,7 +1320,9 @@ static unsigned long srcu_gp_start_if_needed(struct srcu_struct *ssp,
 	 */
 	idx = __srcu_read_lock_nmisafe(ssp);
 	ss_state = smp_load_acquire(&ssp->srcu_sup->srcu_size_state);
-	if (ss_state < SRCU_SIZE_WAIT_CALL)
+	// If !rcu_cpu_beenfullyonline(), interrupts are still disabled,
+	// so no migration is possible in either direction from this CPU.
+	if (ss_state < SRCU_SIZE_WAIT_CALL || !rcu_cpu_beenfullyonline(raw_smp_processor_id()))
 		sdp = per_cpu_ptr(ssp->sda, get_boot_cpu_id());
 	else
 		sdp = raw_cpu_ptr(ssp->sda);

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
  2026-04-29 18:01                   ` Paul E. McKenney
@ 2026-04-30  7:08                     ` Shrikanth Hegde
  2026-04-30 16:05                       ` Paul E. McKenney
  2026-04-30 16:10                       ` Paul E. McKenney
  0 siblings, 2 replies; 32+ messages in thread
From: Shrikanth Hegde @ 2026-04-30  7:08 UTC (permalink / raw)
  To: paulmck
  Cc: Tejun Heo, Vasily Gorbik, Srikar Dronamraju, Boqun Feng,
	Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Uladzislau Rezki, rcu, linux-kernel, linux-s390, Lai Jiangshan,
	samir

Hi Paul.

On 4/29/26 11:31 PM, Paul E. McKenney wrote:

>> That mask = ~0 is really looks uncomfortable to me. What does it mean?
>> It might end up even sending to non possible CPUs without proper checks.
>>
>> It should use either cpumask_setall? or use cpu_online_mask?
>>
>> Your current patch rcu_cpu_beenfullyonline indicates that code around
>> srcu_schedule_cbs_sdp handles hotplug already right?
>> in that case, just setting mask = cpu_online_mask would work?
> 
> Agreed.  Which is why I have this commit queued:
> 
> f8d5aaaf90f8 ("srcu: Don't queue workqueue handlers to never-online CPUs")
> 
> This is currently slated for the upcoming merge window, but if you
> need it sooner, please let us know.  Please see the end of this email
> for the full commit.
> 
> 
> 							Thanx, Paul
> 
>>> /**
>>>    * queue_work_on - queue work on specific cpu
>>>    * @cpu: CPU number to execute work on
>>>    * @wq: workqueue to use
>>>    * @work: work to queue
>>>    *
>>>    * We queue the work to a specific CPU, the caller must ensure it
>>>    * can't go away.  Callers that fail to ensure that the specified
>>>    * CPU cannot go away will execute on a randomly chosen CPU.
>>>    * But note well that callers specifying a CPU that never has been
>>>    * online will get a splat.
>>>    *
>>>    * Return: %false if @work was already on a queue, %true otherwise.
>>>    */
>>
>>
>> In that case, making offline CPUs have a unbound workqueue is wrong. no?
>>
>> It might encourage more users to abuse queue_work_on interface to
>> send to offline CPUs without any checks and onus now falls onto
>> workqueue to disaptch to unbound wqs.
>>
>> So I think it is better to put the guardrails in SRCU instead of any change in
>> workqueue.
> 
> ------------------------------------------------------------------------
> 
> commit f8d5aaaf90f8294890802ce8dccbafd9850ac5f9
> Author: Paul E. McKenney <paulmck@kernel.org>
> Date:   Thu Apr 9 11:16:02 2026 -0700
> 
>      srcu: Don't queue workqueue handlers to never-online CPUs
>      
>      While an srcu_struct structure is in the midst of switching from CPU-0
>      to all-CPUs state, it can attempt to invoke callbacks for CPUs that
>      have never been online.  Worse yet, it can attempt in invoke callbacks
>      for CPUs that never will be online due to not being present in the

for CPUs that never will be online due to being present in the cpu_possible_mask?

>      cpu_possible_mask.  This can cause hangs on s390, which is not set up to
>      deal with workqueue handlers being scheduled on such CPUs.  This commit
>      therefore causes Tree SRCU to refrain from queueing workqueue handlers
>      on CPUs that have not yet (and might never) come online.
>      
>      Because callbacks are not invoked on CPUs that have not been
>      online, it is an error to invoke call_srcu(), synchronize_srcu(), or
>      synchronize_srcu_expedited() on a CPU that is not yet fully online.
>      However, it turns out to be less code to redirect the callbacks
>      from too-early invocations of call_srcu() than to warn about such
>      invocations.  This commit therefore also redirects callbacks queued on
>      not-yet-fully-online CPUs to the boot CPU.
>      
>      Reported-by: Vasily Gorbik <gor@linux.ibm.com>
>      Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
>      Tested-by: Vasily Gorbik <gor@linux.ibm.com>
>      Cc: Tejun Heo <tj@kernel.org>
> 
> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> index 0d01cd8c4b4a7b..7c2f7cc131f7ae 100644
> --- a/kernel/rcu/srcutree.c
> +++ b/kernel/rcu/srcutree.c
> @@ -897,11 +897,9 @@ static void srcu_schedule_cbs_snp(struct srcu_struct *ssp, struct srcu_node *snp
>   {
>   	int cpu;
>   
> -	for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
> -		if (!(mask & (1UL << (cpu - snp->grplo))))
> -			continue;
> -		srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
> -	}
> +	for (cpu = snp->grplo; cpu <= snp->grphi; cpu++)
> +		if ((mask & (1UL << (cpu - snp->grplo))) && rcu_cpu_beenfullyonline(cpu))
> +			srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
>   }
>   
>   /*
> @@ -1322,7 +1320,9 @@ static unsigned long srcu_gp_start_if_needed(struct srcu_struct *ssp,
>   	 */
>   	idx = __srcu_read_lock_nmisafe(ssp);
>   	ss_state = smp_load_acquire(&ssp->srcu_sup->srcu_size_state);
> -	if (ss_state < SRCU_SIZE_WAIT_CALL)
> +	// If !rcu_cpu_beenfullyonline(), interrupts are still disabled,
> +	// so no migration is possible in either direction from this CPU.
> +	if (ss_state < SRCU_SIZE_WAIT_CALL || !rcu_cpu_beenfullyonline(raw_smp_processor_id()))

How can this happen? To get a CPU offline in raw_smp_processor_id() you need to run on the offline
CPU.

>   		sdp = per_cpu_ptr(ssp->sda, get_boot_cpu_id());
>   	else
>   		sdp = raw_cpu_ptr(ssp->sda);


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
  2026-04-30  7:08                     ` Shrikanth Hegde
@ 2026-04-30 16:05                       ` Paul E. McKenney
  2026-04-30 16:10                       ` Paul E. McKenney
  1 sibling, 0 replies; 32+ messages in thread
From: Paul E. McKenney @ 2026-04-30 16:05 UTC (permalink / raw)
  To: Shrikanth Hegde
  Cc: Tejun Heo, Vasily Gorbik, Srikar Dronamraju, Boqun Feng,
	Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Uladzislau Rezki, rcu, linux-kernel, linux-s390, Lai Jiangshan,
	samir

On Thu, Apr 30, 2026 at 12:38:16PM +0530, Shrikanth Hegde wrote:
> Hi Paul.
> 
> On 4/29/26 11:31 PM, Paul E. McKenney wrote:
> 
> > > That mask = ~0 is really looks uncomfortable to me. What does it mean?
> > > It might end up even sending to non possible CPUs without proper checks.
> > > 
> > > It should use either cpumask_setall? or use cpu_online_mask?
> > > 
> > > Your current patch rcu_cpu_beenfullyonline indicates that code around
> > > srcu_schedule_cbs_sdp handles hotplug already right?
> > > in that case, just setting mask = cpu_online_mask would work?
> > 
> > Agreed.  Which is why I have this commit queued:
> > 
> > f8d5aaaf90f8 ("srcu: Don't queue workqueue handlers to never-online CPUs")
> > 
> > This is currently slated for the upcoming merge window, but if you
> > need it sooner, please let us know.  Please see the end of this email
> > for the full commit.
> > 
> > 
> > 							Thanx, Paul
> > 
> > > > /**
> > > >    * queue_work_on - queue work on specific cpu
> > > >    * @cpu: CPU number to execute work on
> > > >    * @wq: workqueue to use
> > > >    * @work: work to queue
> > > >    *
> > > >    * We queue the work to a specific CPU, the caller must ensure it
> > > >    * can't go away.  Callers that fail to ensure that the specified
> > > >    * CPU cannot go away will execute on a randomly chosen CPU.
> > > >    * But note well that callers specifying a CPU that never has been
> > > >    * online will get a splat.
> > > >    *
> > > >    * Return: %false if @work was already on a queue, %true otherwise.
> > > >    */
> > > 
> > > 
> > > In that case, making offline CPUs have a unbound workqueue is wrong. no?
> > > 
> > > It might encourage more users to abuse queue_work_on interface to
> > > send to offline CPUs without any checks and onus now falls onto
> > > workqueue to disaptch to unbound wqs.
> > > 
> > > So I think it is better to put the guardrails in SRCU instead of any change in
> > > workqueue.
> > 
> > ------------------------------------------------------------------------
> > 
> > commit f8d5aaaf90f8294890802ce8dccbafd9850ac5f9
> > Author: Paul E. McKenney <paulmck@kernel.org>
> > Date:   Thu Apr 9 11:16:02 2026 -0700
> > 
> >      srcu: Don't queue workqueue handlers to never-online CPUs
> >      While an srcu_struct structure is in the midst of switching from CPU-0
> >      to all-CPUs state, it can attempt to invoke callbacks for CPUs that
> >      have never been online.  Worse yet, it can attempt in invoke callbacks
> >      for CPUs that never will be online due to not being present in the
> 
> for CPUs that never will be online due to being present in the cpu_possible_mask?
> 
> >      cpu_possible_mask.  This can cause hangs on s390, which is not set up to
> >      deal with workqueue handlers being scheduled on such CPUs.  This commit
> >      therefore causes Tree SRCU to refrain from queueing workqueue handlers
> >      on CPUs that have not yet (and might never) come online.
> >      Because callbacks are not invoked on CPUs that have not been
> >      online, it is an error to invoke call_srcu(), synchronize_srcu(), or
> >      synchronize_srcu_expedited() on a CPU that is not yet fully online.
> >      However, it turns out to be less code to redirect the callbacks
> >      from too-early invocations of call_srcu() than to warn about such
> >      invocations.  This commit therefore also redirects callbacks queued on
> >      not-yet-fully-online CPUs to the boot CPU.
> >      Reported-by: Vasily Gorbik <gor@linux.ibm.com>
> >      Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> >      Tested-by: Vasily Gorbik <gor@linux.ibm.com>
> >      Cc: Tejun Heo <tj@kernel.org>
> > 
> > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> > index 0d01cd8c4b4a7b..7c2f7cc131f7ae 100644
> > --- a/kernel/rcu/srcutree.c
> > +++ b/kernel/rcu/srcutree.c
> > @@ -897,11 +897,9 @@ static void srcu_schedule_cbs_snp(struct srcu_struct *ssp, struct srcu_node *snp
> >   {
> >   	int cpu;
> > -	for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
> > -		if (!(mask & (1UL << (cpu - snp->grplo))))
> > -			continue;
> > -		srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
> > -	}
> > +	for (cpu = snp->grplo; cpu <= snp->grphi; cpu++)
> > +		if ((mask & (1UL << (cpu - snp->grplo))) && rcu_cpu_beenfullyonline(cpu))
> > +			srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
> >   }
> >   /*
> > @@ -1322,7 +1320,9 @@ static unsigned long srcu_gp_start_if_needed(struct srcu_struct *ssp,
> >   	 */
> >   	idx = __srcu_read_lock_nmisafe(ssp);
> >   	ss_state = smp_load_acquire(&ssp->srcu_sup->srcu_size_state);
> > -	if (ss_state < SRCU_SIZE_WAIT_CALL)
> > +	// If !rcu_cpu_beenfullyonline(), interrupts are still disabled,
> > +	// so no migration is possible in either direction from this CPU.
> > +	if (ss_state < SRCU_SIZE_WAIT_CALL || !rcu_cpu_beenfullyonline(raw_smp_processor_id()))
> 
> How can this happen? To get a CPU offline in raw_smp_processor_id() you need to run on the offline
> CPU.

CPUs run for a surprisingly long time before they get around to marking
themselves online.  If a CPU invokes call_srcu() during this time,
this code really will be running on a CPU that is marked as offline.

Now, my initial thought was to instead splat if this happened, but it
turned out to require more code to reliably splat than to just handle
the situation correctly.  So here we are!  ;-)

							Thanx, Paul

> >   		sdp = per_cpu_ptr(ssp->sda, get_boot_cpu_id());
> >   	else
> >   		sdp = raw_cpu_ptr(ssp->sda);
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
  2026-04-30  7:08                     ` Shrikanth Hegde
  2026-04-30 16:05                       ` Paul E. McKenney
@ 2026-04-30 16:10                       ` Paul E. McKenney
  2026-05-01 13:17                         ` Shrikanth Hegde
  1 sibling, 1 reply; 32+ messages in thread
From: Paul E. McKenney @ 2026-04-30 16:10 UTC (permalink / raw)
  To: Shrikanth Hegde
  Cc: Tejun Heo, Vasily Gorbik, Srikar Dronamraju, Boqun Feng,
	Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Uladzislau Rezki, rcu, linux-kernel, linux-s390, Lai Jiangshan,
	samir

On Thu, Apr 30, 2026 at 12:38:16PM +0530, Shrikanth Hegde wrote:
> Hi Paul.
> 
> On 4/29/26 11:31 PM, Paul E. McKenney wrote:

[ . . . ]

Sorry, missed one...

> > ------------------------------------------------------------------------
> > 
> > commit f8d5aaaf90f8294890802ce8dccbafd9850ac5f9
> > Author: Paul E. McKenney <paulmck@kernel.org>
> > Date:   Thu Apr 9 11:16:02 2026 -0700
> > 
> >      srcu: Don't queue workqueue handlers to never-online CPUs
> >      While an srcu_struct structure is in the midst of switching from CPU-0
> >      to all-CPUs state, it can attempt to invoke callbacks for CPUs that
> >      have never been online.  Worse yet, it can attempt in invoke callbacks
> >      for CPUs that never will be online due to not being present in the
> 
> for CPUs that never will be online due to being present in the cpu_possible_mask?

Exactly.

Just because a CPU is in cpu_possible_mask doesn't mean that it will
ever actually come online.  For example, for single-threaded performance
reasons, a given system might choose to bring online only one CPU from
each hypertheaded core.  In that case, the other CPU in each hyperthreaded
core could be in the cpu_possible_mask, but would never come online.

							Thanx, Paul

> >      cpu_possible_mask.  This can cause hangs on s390, which is not set up to
> >      deal with workqueue handlers being scheduled on such CPUs.  This commit
> >      therefore causes Tree SRCU to refrain from queueing workqueue handlers
> >      on CPUs that have not yet (and might never) come online.
> >      Because callbacks are not invoked on CPUs that have not been
> >      online, it is an error to invoke call_srcu(), synchronize_srcu(), or
> >      synchronize_srcu_expedited() on a CPU that is not yet fully online.
> >      However, it turns out to be less code to redirect the callbacks
> >      from too-early invocations of call_srcu() than to warn about such
> >      invocations.  This commit therefore also redirects callbacks queued on
> >      not-yet-fully-online CPUs to the boot CPU.
> >      Reported-by: Vasily Gorbik <gor@linux.ibm.com>
> >      Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> >      Tested-by: Vasily Gorbik <gor@linux.ibm.com>
> >      Cc: Tejun Heo <tj@kernel.org>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
  2026-04-30 16:10                       ` Paul E. McKenney
@ 2026-05-01 13:17                         ` Shrikanth Hegde
  2026-05-01 14:00                           ` Paul E. McKenney
  0 siblings, 1 reply; 32+ messages in thread
From: Shrikanth Hegde @ 2026-05-01 13:17 UTC (permalink / raw)
  To: paulmck
  Cc: Tejun Heo, Vasily Gorbik, Srikar Dronamraju, Boqun Feng,
	Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Uladzislau Rezki, rcu, linux-kernel, linux-s390, Lai Jiangshan,
	samir

Hi Paul.

On 4/30/26 9:40 PM, Paul E. McKenney wrote:
> On Thu, Apr 30, 2026 at 12:38:16PM +0530, Shrikanth Hegde wrote:
>> Hi Paul.
>>
>> On 4/29/26 11:31 PM, Paul E. McKenney wrote:
> 
> [ . . . ]
> 
> Sorry, missed one...
> 
>>> ------------------------------------------------------------------------
>>>
>>> commit f8d5aaaf90f8294890802ce8dccbafd9850ac5f9
>>> Author: Paul E. McKenney <paulmck@kernel.org>
>>> Date:   Thu Apr 9 11:16:02 2026 -0700
>>>
>>>       srcu: Don't queue workqueue handlers to never-online CPUs
>>>       While an srcu_struct structure is in the midst of switching from CPU-0
>>>       to all-CPUs state, it can attempt to invoke callbacks for CPUs that
>>>       have never been online.  Worse yet, it can attempt in invoke callbacks
>>>       for CPUs that never will be online due to not being present in the
>>
>> for CPUs that never will be online due to being present in the cpu_possible_mask?
> 
> Exactly.
> 
> Just because a CPU is in cpu_possible_mask doesn't mean that it will
> ever actually come online.  For example, for single-threaded performance
> reasons, a given system might choose to bring online only one CPU from
> each hypertheaded core.  In that case, the other CPU in each hyperthreaded
> core could be in the cpu_possible_mask, but would never come online.
> 
> 							Thanx, Paul
> 

Nit: I was suggesting *not* is probably not needed in that changelog.
I agree with explanation.

>>>       cpu_possible_mask.  This can cause hangs on s390, which is not set up to
>>>       deal with workqueue handlers being scheduled on such CPUs.  This commit
>>>       therefore causes Tree SRCU to refrain from queueing workqueue handlers
>>>       on CPUs that have not yet (and might never) come online.
>>>       Because callbacks are not invoked on CPUs that have not been
>>>       online, it is an error to invoke call_srcu(), synchronize_srcu(), or
>>>       synchronize_srcu_expedited() on a CPU that is not yet fully online.
>>>       However, it turns out to be less code to redirect the callbacks
>>>       from too-early invocations of call_srcu() than to warn about such
>>>       invocations.  This commit therefore also redirects callbacks queued on
>>>       not-yet-fully-online CPUs to the boot CPU.
>>>       Reported-by: Vasily Gorbik <gor@linux.ibm.com>
>>>       Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
>>>       Tested-by: Vasily Gorbik <gor@linux.ibm.com>
>>>       Cc: Tejun Heo <tj@kernel.org>


Alright. With those two explanations, this LGTM.

Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
  2026-05-01 13:17                         ` Shrikanth Hegde
@ 2026-05-01 14:00                           ` Paul E. McKenney
  0 siblings, 0 replies; 32+ messages in thread
From: Paul E. McKenney @ 2026-05-01 14:00 UTC (permalink / raw)
  To: Shrikanth Hegde
  Cc: Tejun Heo, Vasily Gorbik, Srikar Dronamraju, Boqun Feng,
	Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Uladzislau Rezki, rcu, linux-kernel, linux-s390, Lai Jiangshan,
	samir

On Fri, May 01, 2026 at 06:47:55PM +0530, Shrikanth Hegde wrote:
> Hi Paul.
> 
> On 4/30/26 9:40 PM, Paul E. McKenney wrote:
> > On Thu, Apr 30, 2026 at 12:38:16PM +0530, Shrikanth Hegde wrote:
> > > Hi Paul.
> > > 
> > > On 4/29/26 11:31 PM, Paul E. McKenney wrote:
> > 
> > [ . . . ]
> > 
> > Sorry, missed one...
> > 
> > > > ------------------------------------------------------------------------
> > > > 
> > > > commit f8d5aaaf90f8294890802ce8dccbafd9850ac5f9
> > > > Author: Paul E. McKenney <paulmck@kernel.org>
> > > > Date:   Thu Apr 9 11:16:02 2026 -0700
> > > > 
> > > >       srcu: Don't queue workqueue handlers to never-online CPUs
> > > >       While an srcu_struct structure is in the midst of switching from CPU-0
> > > >       to all-CPUs state, it can attempt to invoke callbacks for CPUs that
> > > >       have never been online.  Worse yet, it can attempt in invoke callbacks
> > > >       for CPUs that never will be online due to not being present in the
> > > 
> > > for CPUs that never will be online due to being present in the cpu_possible_mask?
> > 
> > Exactly.
> > 
> > Just because a CPU is in cpu_possible_mask doesn't mean that it will
> > ever actually come online.  For example, for single-threaded performance
> > reasons, a given system might choose to bring online only one CPU from
> > each hypertheaded core.  In that case, the other CPU in each hyperthreaded
> > core could be in the cpu_possible_mask, but would never come online.
> 
> Nit: I was suggesting *not* is probably not needed in that changelog.
> I agree with explanation.

Fair point, although before the fix it really would quite happily invoke
queue_work_on() for CPUs *not* in the cpu_possible_mask.  So I believe
that the original sentence is correct.

Me, I thought that you were asking if this also applied to CPUs in
cpu_possible_mask that were never going to come online.  I could change
this sentence to something like:

	Worse yet, it can attempt in invoke callbacks for CPUs that
	never will be online, even including imaginary CPUs not in
	cpu_possible_mask.

Would that help?

> > > >       cpu_possible_mask.  This can cause hangs on s390, which is not set up to
> > > >       deal with workqueue handlers being scheduled on such CPUs.  This commit
> > > >       therefore causes Tree SRCU to refrain from queueing workqueue handlers
> > > >       on CPUs that have not yet (and might never) come online.
> > > >       Because callbacks are not invoked on CPUs that have not been
> > > >       online, it is an error to invoke call_srcu(), synchronize_srcu(), or
> > > >       synchronize_srcu_expedited() on a CPU that is not yet fully online.
> > > >       However, it turns out to be less code to redirect the callbacks
> > > >       from too-early invocations of call_srcu() than to warn about such
> > > >       invocations.  This commit therefore also redirects callbacks queued on
> > > >       not-yet-fully-online CPUs to the boot CPU.
> > > >       Reported-by: Vasily Gorbik <gor@linux.ibm.com>
> > > >       Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > > >       Tested-by: Vasily Gorbik <gor@linux.ibm.com>
> > > >       Cc: Tejun Heo <tj@kernel.org>
> 
> 
> Alright. With those two explanations, this LGTM.
> 
> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>

Thank you!

Unless you tell me otherwise, I will make the change I suggested above
and add your Reviewed-by.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
  2026-04-10 18:53         ` Tejun Heo
  2026-04-10 19:17           ` Paul E. McKenney
  2026-04-29 15:00           ` Srikar Dronamraju
@ 2026-04-29 18:17           ` Samir M
  2 siblings, 0 replies; 32+ messages in thread
From: Samir M @ 2026-04-29 18:17 UTC (permalink / raw)
  To: Tejun Heo, Boqun Feng
  Cc: Vasily Gorbik, Paul E. McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Joel Fernandes, Uladzislau Rezki, rcu,
	linux-kernel, linux-s390, Lai Jiangshan, Shrikanth Hegde


On 11/04/26 12:23 am, Tejun Heo wrote:
> Hello,
>
> On Thu, Apr 09, 2026 at 11:10:04AM -0700, Boqun Feng wrote:
>> On Thu, Apr 09, 2026 at 07:47:09AM -1000, Tejun Heo wrote:
>>> On Thu, Apr 09, 2026 at 10:40:05AM -0700, Boqun Feng wrote:
>>>> On Thu, Apr 09, 2026 at 10:26:49AM -0700, Boqun Feng wrote:
>>>>> On Thu, Apr 09, 2026 at 03:08:45PM +0200, Vasily Gorbik wrote:
>>>>>> Commit 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
>>>>>> non-preemptible") defers srcu_node tree allocation when called under
>>>>>> raw spinlock, putting SRCU through ~6 transitional grace periods
>>>>>> (SRCU_SIZE_ALLOC to SRCU_SIZE_BIG). During this transition srcu_gp_end()
>>>>>> uses mask = ~0, which makes srcu_schedule_cbs_snp() call queue_work_on()
>>>>>> for every possible CPU. Since rcu_gp_wq is WQ_PERCPU, work targets
>>>>>> per-CPU pools directly - pools for not-online CPUs have no workers,
>>>>> [Cc workqueue]
>>>>>
>>>>> Hmm.. I thought for offline CPUs the corresponding worker pools become a
>>>>> unbound one hence there are still workers?
>>>>>
>>>> Ah, as Paul replied in another email, the problem was because these CPUs
>>>> had never been onlined, so they don't even have unbound workers?
>>> Hahaha, we do initialize worker pool for every possible CPU but the
>>> transition to unbound operation happens in the hot unplug callback. We
>> ;-) ;-) ;-)
>>
>>> probably need to do some of the hot unplug operation during init if the CPU
>> Seems that we (mostly Paul) have our own trick to track whether a CPU
>> has ever been onlined in RCU, see rcu_cpu_beenfullyonline(). Paul also
>> used it in his fix [1]. And I think it won't be that hard to copy it
>> into workqueue and let queue_work_on() use it so that if the user queues
>> a work on a never-onlined CPU, it can detect it (with a warning?) and do
>> something?
> The easiest way to do this is just creating the initial workers for all
> possible pools. Please see below. However, the downside is that it's going
> to create all workers for all possible cpus. This isn't a problem for
> anybody else but these IBM mainframes often come up with a lot of possible
> but not-yet-or-ever-online CPUs for capacity management, so the cost may not
> be negligible on some configurations.
>
> IBM folks, is that okay?
>
> Also, why do you need to queue work items on an offline CPU? Do they
> actually have to be per-cpu? Can you get away with using an unbound
> workqueue?
>
> Thanks.

Hi Tejun,

Thank you for the patch addressing the workqueue lockup issue.
workqueue lockup issue(PowerPC): 
https://lore.kernel.org/lkml/97a7d011-d573-4754-9e5d-68b562c64089@linux.ibm.com/

Regarding the approach of creating workers for all possible CPUs: On IBM 
PowerPC, we commonly see configurations with a large number of possible 
CPUs for capacity management. For example, systems with 384 possible 
CPUs but only 80 online. This is by design - the additional capacity 
exists for dynamic activation based on licensing and workload requirements.

Creating workers for all 384 possible CPUs upfront would mean allocating 
resources for 304 workers that may never be used. While I understand 
this is the simplest solution to the race condition, I'm concerned about 
the memory overhead on such configurations.

Two questions:
1. What is the per-worker memory footprint? Can we quantify the overhead 
for systems with large possible-but-offline CPU counts?
2. Would an alternative approach be feasible - such as lazy worker 
creation during CPU hotplug, or deferring worker creation until a CPU 
actually comes online?

I can test this patch on our IBM PowerPC systems to measure the actual 
memory impact and verify the POOL_DISASSOCIATED handling works correctly 
with large offline CPU counts. Would that be helpful?

Please let me know your thoughts.

Thanks,
Samir


> From: Tejun Heo <tj@kernel.org>
> Subject: workqueue: Create workers for all possible CPUs on init
>
> Per-CPU worker pools are initialized for every possible CPU during early boot,
> but workqueue_init() only creates initial workers for online CPUs. On systems
> where possible CPUs outnumber online CPUs (e.g. s390 LPARs with 76 online and
> 400 possible CPUs), the pools for never-onlined CPUs have POOL_DISASSOCIATED
> set but no workers. Any work item queued on such a CPU hangs indefinitely.
>
> This was exposed by 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
> non-preemptible") which made SRCU schedule callbacks on all possible CPUs
> during size transitions, triggering workqueue lockup warnings for all
> never-onlined CPUs.
>
> Create workers for all possible CPUs during init, not just online ones. For
> online CPUs, the behavior is unchanged - POOL_DISASSOCIATED is cleared and the
> worker is bound to the CPU. For not-yet-online CPUs, POOL_DISASSOCIATED
> remains set, so worker_attach_to_pool() marks the worker UNBOUND and it can
> execute on any CPU. When the CPU later comes online, rebind_workers() handles
> the transition to associated operation as usual.
>
> Reported-by: Vasily Gorbik <gor@linux.ibm.com>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Boqun Feng <boqun@kernel.org>
> Cc: Paul E. McKenney <paulmck@kernel.org>
> ---
>   kernel/workqueue.c |    5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
>
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -8068,9 +8068,10 @@ void __init workqueue_init(void)
>   		for_each_bh_worker_pool(pool, cpu)
>   			BUG_ON(!create_worker(pool));
>
> -	for_each_online_cpu(cpu) {
> +	for_each_possible_cpu(cpu) {
>   		for_each_cpu_worker_pool(pool, cpu) {
> -			pool->flags &= ~POOL_DISASSOCIATED;
> +			if (cpu_online(cpu))
> +				pool->flags &= ~POOL_DISASSOCIATED;
>   			BUG_ON(!create_worker(pool));
>   		}
>   	}

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2026-05-01 14:00 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-09 13:08 BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition Vasily Gorbik
2026-04-09 17:22 ` Paul E. McKenney
2026-04-09 19:15   ` Vasily Gorbik
2026-04-09 20:10     ` Paul E. McKenney
2026-04-10  4:03       ` Paul E. McKenney
2026-04-14 19:24         ` Paul E. McKenney
2026-04-29 17:50           ` Vasily Gorbik
2026-04-29 18:05             ` Paul E. McKenney
2026-04-29 18:23               ` Vasily Gorbik
2026-04-09 17:26 ` Boqun Feng
2026-04-09 17:40   ` Boqun Feng
2026-04-09 17:47     ` Tejun Heo
2026-04-09 17:48       ` Tejun Heo
2026-04-09 18:04         ` Paul E. McKenney
2026-04-09 18:09           ` Tejun Heo
2026-04-09 18:15             ` Paul E. McKenney
2026-04-09 18:10       ` Boqun Feng
2026-04-09 18:27         ` Paul E. McKenney
2026-04-10 18:53         ` Tejun Heo
2026-04-10 19:17           ` Paul E. McKenney
2026-04-10 19:29             ` Tejun Heo
2026-04-29 15:00           ` Srikar Dronamraju
2026-04-29 17:08             ` Vasily Gorbik
2026-04-29 17:18               ` Paul E. McKenney
2026-04-29 17:44                 ` Shrikanth Hegde
2026-04-29 18:01                   ` Paul E. McKenney
2026-04-30  7:08                     ` Shrikanth Hegde
2026-04-30 16:05                       ` Paul E. McKenney
2026-04-30 16:10                       ` Paul E. McKenney
2026-05-01 13:17                         ` Shrikanth Hegde
2026-05-01 14:00                           ` Paul E. McKenney
2026-04-29 18:17           ` Samir M

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox