From: Shrikanth Hegde <sshegde@linux.ibm.com>
To: paulmck@kernel.org
Cc: Tejun Heo <tj@kernel.org>, Vasily Gorbik <gor@linux.ibm.com>,
Srikar Dronamraju <srikar@linux.ibm.com>,
Boqun Feng <boqun@kernel.org>,
Frederic Weisbecker <frederic@kernel.org>,
Neeraj Upadhyay <neeraj.upadhyay@kernel.org>,
Joel Fernandes <joelagnelf@nvidia.com>,
Uladzislau Rezki <urezki@gmail.com>,
rcu@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-s390@vger.kernel.org,
Lai Jiangshan <jiangshanlai@gmail.com>,
samir@linux.ibm.com
Subject: Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
Date: Thu, 30 Apr 2026 12:38:16 +0530 [thread overview]
Message-ID: <868cbc25-45a9-4476-b77e-7f878f1cd42c@linux.ibm.com> (raw)
In-Reply-To: <7b5665b4-ddd0-404a-8314-fd0a170db458@paulmck-laptop>
Hi Paul.
On 4/29/26 11:31 PM, Paul E. McKenney wrote:
>> That mask = ~0 is really looks uncomfortable to me. What does it mean?
>> It might end up even sending to non possible CPUs without proper checks.
>>
>> It should use either cpumask_setall? or use cpu_online_mask?
>>
>> Your current patch rcu_cpu_beenfullyonline indicates that code around
>> srcu_schedule_cbs_sdp handles hotplug already right?
>> in that case, just setting mask = cpu_online_mask would work?
>
> Agreed. Which is why I have this commit queued:
>
> f8d5aaaf90f8 ("srcu: Don't queue workqueue handlers to never-online CPUs")
>
> This is currently slated for the upcoming merge window, but if you
> need it sooner, please let us know. Please see the end of this email
> for the full commit.
>
>
> Thanx, Paul
>
>>> /**
>>> * queue_work_on - queue work on specific cpu
>>> * @cpu: CPU number to execute work on
>>> * @wq: workqueue to use
>>> * @work: work to queue
>>> *
>>> * We queue the work to a specific CPU, the caller must ensure it
>>> * can't go away. Callers that fail to ensure that the specified
>>> * CPU cannot go away will execute on a randomly chosen CPU.
>>> * But note well that callers specifying a CPU that never has been
>>> * online will get a splat.
>>> *
>>> * Return: %false if @work was already on a queue, %true otherwise.
>>> */
>>
>>
>> In that case, making offline CPUs have a unbound workqueue is wrong. no?
>>
>> It might encourage more users to abuse queue_work_on interface to
>> send to offline CPUs without any checks and onus now falls onto
>> workqueue to disaptch to unbound wqs.
>>
>> So I think it is better to put the guardrails in SRCU instead of any change in
>> workqueue.
>
> ------------------------------------------------------------------------
>
> commit f8d5aaaf90f8294890802ce8dccbafd9850ac5f9
> Author: Paul E. McKenney <paulmck@kernel.org>
> Date: Thu Apr 9 11:16:02 2026 -0700
>
> srcu: Don't queue workqueue handlers to never-online CPUs
>
> While an srcu_struct structure is in the midst of switching from CPU-0
> to all-CPUs state, it can attempt to invoke callbacks for CPUs that
> have never been online. Worse yet, it can attempt in invoke callbacks
> for CPUs that never will be online due to not being present in the
for CPUs that never will be online due to being present in the cpu_possible_mask?
> cpu_possible_mask. This can cause hangs on s390, which is not set up to
> deal with workqueue handlers being scheduled on such CPUs. This commit
> therefore causes Tree SRCU to refrain from queueing workqueue handlers
> on CPUs that have not yet (and might never) come online.
>
> Because callbacks are not invoked on CPUs that have not been
> online, it is an error to invoke call_srcu(), synchronize_srcu(), or
> synchronize_srcu_expedited() on a CPU that is not yet fully online.
> However, it turns out to be less code to redirect the callbacks
> from too-early invocations of call_srcu() than to warn about such
> invocations. This commit therefore also redirects callbacks queued on
> not-yet-fully-online CPUs to the boot CPU.
>
> Reported-by: Vasily Gorbik <gor@linux.ibm.com>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> Tested-by: Vasily Gorbik <gor@linux.ibm.com>
> Cc: Tejun Heo <tj@kernel.org>
>
> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> index 0d01cd8c4b4a7b..7c2f7cc131f7ae 100644
> --- a/kernel/rcu/srcutree.c
> +++ b/kernel/rcu/srcutree.c
> @@ -897,11 +897,9 @@ static void srcu_schedule_cbs_snp(struct srcu_struct *ssp, struct srcu_node *snp
> {
> int cpu;
>
> - for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
> - if (!(mask & (1UL << (cpu - snp->grplo))))
> - continue;
> - srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
> - }
> + for (cpu = snp->grplo; cpu <= snp->grphi; cpu++)
> + if ((mask & (1UL << (cpu - snp->grplo))) && rcu_cpu_beenfullyonline(cpu))
> + srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
> }
>
> /*
> @@ -1322,7 +1320,9 @@ static unsigned long srcu_gp_start_if_needed(struct srcu_struct *ssp,
> */
> idx = __srcu_read_lock_nmisafe(ssp);
> ss_state = smp_load_acquire(&ssp->srcu_sup->srcu_size_state);
> - if (ss_state < SRCU_SIZE_WAIT_CALL)
> + // If !rcu_cpu_beenfullyonline(), interrupts are still disabled,
> + // so no migration is possible in either direction from this CPU.
> + if (ss_state < SRCU_SIZE_WAIT_CALL || !rcu_cpu_beenfullyonline(raw_smp_processor_id()))
How can this happen? To get a CPU offline in raw_smp_processor_id() you need to run on the offline
CPU.
> sdp = per_cpu_ptr(ssp->sda, get_boot_cpu_id());
> else
> sdp = raw_cpu_ptr(ssp->sda);
next prev parent reply other threads:[~2026-04-30 7:08 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-09 13:08 BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition Vasily Gorbik
2026-04-09 17:22 ` Paul E. McKenney
2026-04-09 19:15 ` Vasily Gorbik
2026-04-09 20:10 ` Paul E. McKenney
2026-04-10 4:03 ` Paul E. McKenney
2026-04-14 19:24 ` Paul E. McKenney
2026-04-29 17:50 ` Vasily Gorbik
2026-04-29 18:05 ` Paul E. McKenney
2026-04-29 18:23 ` Vasily Gorbik
2026-04-09 17:26 ` Boqun Feng
2026-04-09 17:40 ` Boqun Feng
2026-04-09 17:47 ` Tejun Heo
2026-04-09 17:48 ` Tejun Heo
2026-04-09 18:04 ` Paul E. McKenney
2026-04-09 18:09 ` Tejun Heo
2026-04-09 18:15 ` Paul E. McKenney
2026-04-09 18:10 ` Boqun Feng
2026-04-09 18:27 ` Paul E. McKenney
2026-04-10 18:53 ` Tejun Heo
2026-04-10 19:17 ` Paul E. McKenney
2026-04-10 19:29 ` Tejun Heo
2026-04-29 15:00 ` Srikar Dronamraju
2026-04-29 17:08 ` Vasily Gorbik
2026-04-29 17:18 ` Paul E. McKenney
2026-04-29 17:44 ` Shrikanth Hegde
2026-04-29 18:01 ` Paul E. McKenney
2026-04-30 7:08 ` Shrikanth Hegde [this message]
2026-04-30 16:05 ` Paul E. McKenney
2026-04-30 16:10 ` Paul E. McKenney
2026-05-01 13:17 ` Shrikanth Hegde
2026-05-01 14:00 ` Paul E. McKenney
2026-04-29 18:17 ` Samir M
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=868cbc25-45a9-4476-b77e-7f878f1cd42c@linux.ibm.com \
--to=sshegde@linux.ibm.com \
--cc=boqun@kernel.org \
--cc=frederic@kernel.org \
--cc=gor@linux.ibm.com \
--cc=jiangshanlai@gmail.com \
--cc=joelagnelf@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=neeraj.upadhyay@kernel.org \
--cc=paulmck@kernel.org \
--cc=rcu@vger.kernel.org \
--cc=samir@linux.ibm.com \
--cc=srikar@linux.ibm.com \
--cc=tj@kernel.org \
--cc=urezki@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox