public inbox for linux-s390@vger.kernel.org
 help / color / mirror / Atom feed
From: Shrikanth Hegde <sshegde@linux.ibm.com>
To: paulmck@kernel.org
Cc: Tejun Heo <tj@kernel.org>, Vasily Gorbik <gor@linux.ibm.com>,
	Srikar Dronamraju <srikar@linux.ibm.com>,
	Boqun Feng <boqun@kernel.org>,
	Frederic Weisbecker <frederic@kernel.org>,
	Neeraj Upadhyay <neeraj.upadhyay@kernel.org>,
	Joel Fernandes <joelagnelf@nvidia.com>,
	Uladzislau Rezki <urezki@gmail.com>,
	rcu@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-s390@vger.kernel.org,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	samir@linux.ibm.com
Subject: Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
Date: Thu, 30 Apr 2026 12:38:16 +0530	[thread overview]
Message-ID: <868cbc25-45a9-4476-b77e-7f878f1cd42c@linux.ibm.com> (raw)
In-Reply-To: <7b5665b4-ddd0-404a-8314-fd0a170db458@paulmck-laptop>

Hi Paul.

On 4/29/26 11:31 PM, Paul E. McKenney wrote:

>> That mask = ~0 is really looks uncomfortable to me. What does it mean?
>> It might end up even sending to non possible CPUs without proper checks.
>>
>> It should use either cpumask_setall? or use cpu_online_mask?
>>
>> Your current patch rcu_cpu_beenfullyonline indicates that code around
>> srcu_schedule_cbs_sdp handles hotplug already right?
>> in that case, just setting mask = cpu_online_mask would work?
> 
> Agreed.  Which is why I have this commit queued:
> 
> f8d5aaaf90f8 ("srcu: Don't queue workqueue handlers to never-online CPUs")
> 
> This is currently slated for the upcoming merge window, but if you
> need it sooner, please let us know.  Please see the end of this email
> for the full commit.
> 
> 
> 							Thanx, Paul
> 
>>> /**
>>>    * queue_work_on - queue work on specific cpu
>>>    * @cpu: CPU number to execute work on
>>>    * @wq: workqueue to use
>>>    * @work: work to queue
>>>    *
>>>    * We queue the work to a specific CPU, the caller must ensure it
>>>    * can't go away.  Callers that fail to ensure that the specified
>>>    * CPU cannot go away will execute on a randomly chosen CPU.
>>>    * But note well that callers specifying a CPU that never has been
>>>    * online will get a splat.
>>>    *
>>>    * Return: %false if @work was already on a queue, %true otherwise.
>>>    */
>>
>>
>> In that case, making offline CPUs have a unbound workqueue is wrong. no?
>>
>> It might encourage more users to abuse queue_work_on interface to
>> send to offline CPUs without any checks and onus now falls onto
>> workqueue to disaptch to unbound wqs.
>>
>> So I think it is better to put the guardrails in SRCU instead of any change in
>> workqueue.
> 
> ------------------------------------------------------------------------
> 
> commit f8d5aaaf90f8294890802ce8dccbafd9850ac5f9
> Author: Paul E. McKenney <paulmck@kernel.org>
> Date:   Thu Apr 9 11:16:02 2026 -0700
> 
>      srcu: Don't queue workqueue handlers to never-online CPUs
>      
>      While an srcu_struct structure is in the midst of switching from CPU-0
>      to all-CPUs state, it can attempt to invoke callbacks for CPUs that
>      have never been online.  Worse yet, it can attempt in invoke callbacks
>      for CPUs that never will be online due to not being present in the

for CPUs that never will be online due to being present in the cpu_possible_mask?

>      cpu_possible_mask.  This can cause hangs on s390, which is not set up to
>      deal with workqueue handlers being scheduled on such CPUs.  This commit
>      therefore causes Tree SRCU to refrain from queueing workqueue handlers
>      on CPUs that have not yet (and might never) come online.
>      
>      Because callbacks are not invoked on CPUs that have not been
>      online, it is an error to invoke call_srcu(), synchronize_srcu(), or
>      synchronize_srcu_expedited() on a CPU that is not yet fully online.
>      However, it turns out to be less code to redirect the callbacks
>      from too-early invocations of call_srcu() than to warn about such
>      invocations.  This commit therefore also redirects callbacks queued on
>      not-yet-fully-online CPUs to the boot CPU.
>      
>      Reported-by: Vasily Gorbik <gor@linux.ibm.com>
>      Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
>      Tested-by: Vasily Gorbik <gor@linux.ibm.com>
>      Cc: Tejun Heo <tj@kernel.org>
> 
> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> index 0d01cd8c4b4a7b..7c2f7cc131f7ae 100644
> --- a/kernel/rcu/srcutree.c
> +++ b/kernel/rcu/srcutree.c
> @@ -897,11 +897,9 @@ static void srcu_schedule_cbs_snp(struct srcu_struct *ssp, struct srcu_node *snp
>   {
>   	int cpu;
>   
> -	for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
> -		if (!(mask & (1UL << (cpu - snp->grplo))))
> -			continue;
> -		srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
> -	}
> +	for (cpu = snp->grplo; cpu <= snp->grphi; cpu++)
> +		if ((mask & (1UL << (cpu - snp->grplo))) && rcu_cpu_beenfullyonline(cpu))
> +			srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
>   }
>   
>   /*
> @@ -1322,7 +1320,9 @@ static unsigned long srcu_gp_start_if_needed(struct srcu_struct *ssp,
>   	 */
>   	idx = __srcu_read_lock_nmisafe(ssp);
>   	ss_state = smp_load_acquire(&ssp->srcu_sup->srcu_size_state);
> -	if (ss_state < SRCU_SIZE_WAIT_CALL)
> +	// If !rcu_cpu_beenfullyonline(), interrupts are still disabled,
> +	// so no migration is possible in either direction from this CPU.
> +	if (ss_state < SRCU_SIZE_WAIT_CALL || !rcu_cpu_beenfullyonline(raw_smp_processor_id()))

How can this happen? To get a CPU offline in raw_smp_processor_id() you need to run on the offline
CPU.

>   		sdp = per_cpu_ptr(ssp->sda, get_boot_cpu_id());
>   	else
>   		sdp = raw_cpu_ptr(ssp->sda);


  reply	other threads:[~2026-04-30  7:08 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-09 13:08 BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition Vasily Gorbik
2026-04-09 17:22 ` Paul E. McKenney
2026-04-09 19:15   ` Vasily Gorbik
2026-04-09 20:10     ` Paul E. McKenney
2026-04-10  4:03       ` Paul E. McKenney
2026-04-14 19:24         ` Paul E. McKenney
2026-04-29 17:50           ` Vasily Gorbik
2026-04-29 18:05             ` Paul E. McKenney
2026-04-29 18:23               ` Vasily Gorbik
2026-04-09 17:26 ` Boqun Feng
2026-04-09 17:40   ` Boqun Feng
2026-04-09 17:47     ` Tejun Heo
2026-04-09 17:48       ` Tejun Heo
2026-04-09 18:04         ` Paul E. McKenney
2026-04-09 18:09           ` Tejun Heo
2026-04-09 18:15             ` Paul E. McKenney
2026-04-09 18:10       ` Boqun Feng
2026-04-09 18:27         ` Paul E. McKenney
2026-04-10 18:53         ` Tejun Heo
2026-04-10 19:17           ` Paul E. McKenney
2026-04-10 19:29             ` Tejun Heo
2026-04-29 15:00           ` Srikar Dronamraju
2026-04-29 17:08             ` Vasily Gorbik
2026-04-29 17:18               ` Paul E. McKenney
2026-04-29 17:44                 ` Shrikanth Hegde
2026-04-29 18:01                   ` Paul E. McKenney
2026-04-30  7:08                     ` Shrikanth Hegde [this message]
2026-04-30 16:05                       ` Paul E. McKenney
2026-04-30 16:10                       ` Paul E. McKenney
2026-05-01 13:17                         ` Shrikanth Hegde
2026-05-01 14:00                           ` Paul E. McKenney
2026-04-29 18:17           ` Samir M

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=868cbc25-45a9-4476-b77e-7f878f1cd42c@linux.ibm.com \
    --to=sshegde@linux.ibm.com \
    --cc=boqun@kernel.org \
    --cc=frederic@kernel.org \
    --cc=gor@linux.ibm.com \
    --cc=jiangshanlai@gmail.com \
    --cc=joelagnelf@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=neeraj.upadhyay@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=rcu@vger.kernel.org \
    --cc=samir@linux.ibm.com \
    --cc=srikar@linux.ibm.com \
    --cc=tj@kernel.org \
    --cc=urezki@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox