From: Frederic Weisbecker <frederic@kernel.org>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>,
Boqun Feng <boqun.feng@gmail.com>,
Joel Fernandes <joel@joelfernandes.org>,
Neeraj Upadhyay <neeraj.upadhyay@amd.com>,
"Paul E . McKenney" <paulmck@kernel.org>,
Uladzislau Rezki <urezki@gmail.com>,
Zqiang <qiang.zhang1211@gmail.com>, rcu <rcu@vger.kernel.org>
Subject: [PATCH 03/11] rcu/nocb: Assert no callbacks while nocb kthread allocation fails
Date: Thu, 30 May 2024 15:45:44 +0200 [thread overview]
Message-ID: <20240530134552.5467-4-frederic@kernel.org> (raw)
In-Reply-To: <20240530134552.5467-1-frederic@kernel.org>
When a NOCB CPU fails to create a nocb kthread on bringup, the CPU is
then deoffloaded. The barrier mutex is locked at this stage. It is
typically used to protect against concurrent (de-)offloading and/or
concurrent rcu_barrier() that would otherwise risk a nocb locking
imbalance. However:
* rcu_barrier() can't run concurrently if it's the boot CPU on early
boot-up.
* rcu_barrier() can run concurrently if it's a secondary CPU but it is
expected to see 0 callbacks on this target because it's the first
time it boots.
* (de-)offloading can't happen concurrently with smp_init(), as
rcutorture is initialized later, at least not before device_initcall(),
and userspace isn't available yet.
* (de-)offloading can't happen concurrently with cpu_up(), courtesy of
cpu_hotplug_lock.
But:
* The lazy shrinker might run concurrently with cpu_up(). It shouldn't
try to grab the nocb_lock and risk an imbalance due to lazy_len
supposed to be 0 but be extra cautious.
* Also be cautious against resume from hibernation potential subtleties.
So keep the locking and add some assertions and comments.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
kernel/rcu/tree_nocb.h | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index f4112fc663a7..fdd0616f2fd1 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -1442,7 +1442,7 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu)
"rcuog/%d", rdp_gp->cpu);
if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo GP kthread, OOM is now expected behavior\n", __func__)) {
mutex_unlock(&rdp_gp->nocb_gp_kthread_mutex);
- goto end;
+ goto err;
}
WRITE_ONCE(rdp_gp->nocb_gp_kthread, t);
if (kthread_prio)
@@ -1454,7 +1454,7 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu)
t = kthread_create(rcu_nocb_cb_kthread, rdp,
"rcuo%c/%d", rcu_state.abbr, cpu);
if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo CB kthread, OOM is now expected behavior\n", __func__))
- goto end;
+ goto err;
if (rcu_rdp_is_offloaded(rdp))
wake_up_process(t);
@@ -1467,7 +1467,15 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu)
WRITE_ONCE(rdp->nocb_cb_kthread, t);
WRITE_ONCE(rdp->nocb_gp_kthread, rdp_gp->nocb_gp_kthread);
return;
-end:
+
+err:
+ /*
+ * No need to protect against concurrent rcu_barrier()
+ * because the number of callbacks should be 0 for a non-boot CPU,
+ * therefore rcu_barrier() shouldn't even try to grab the nocb_lock.
+ * But hold barrier_mutex to avoid nocb_lock imbalance from shrinker.
+ */
+ WARN_ON_ONCE(system_state > SYSTEM_BOOTING && rcu_segcblist_n_cbs(&rdp->cblist));
mutex_lock(&rcu_state.barrier_mutex);
if (rcu_rdp_is_offloaded(rdp)) {
rcu_nocb_rdp_deoffload(rdp);
--
2.45.1
next prev parent reply other threads:[~2024-05-30 13:46 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-30 13:45 [PATCH 00/11] rcu/nocb: (De-)offloading on offline CPUs Frederic Weisbecker
2024-05-30 13:45 ` [PATCH 01/11] rcu/nocb: Introduce RCU_NOCB_LOCKDEP_WARN() Frederic Weisbecker
2024-05-30 13:45 ` [PATCH 02/11] rcu/nocb: Move nocb field at the end of state struct Frederic Weisbecker
2024-05-30 13:45 ` Frederic Weisbecker [this message]
2024-05-30 13:45 ` [PATCH 04/11] rcu/nocb: Introduce nocb mutex Frederic Weisbecker
2024-05-30 13:45 ` [PATCH 05/11] rcu/nocb: (De-)offload callbacks on offline CPUs only Frederic Weisbecker
2024-05-30 13:45 ` [PATCH 06/11] rcu/nocb: Remove halfway (de-)offloading handling from bypass Frederic Weisbecker
2024-05-30 13:45 ` [PATCH 07/11] rcu/nocb: Remove halfway (de-)offloading handling from rcu_core()'s QS reporting Frederic Weisbecker
2024-05-30 13:45 ` [PATCH 08/11] rcu/nocb: Remove halfway (de-)offloading handling from rcu_core Frederic Weisbecker
2024-05-30 13:45 ` [PATCH 09/11] rcu/nocb: Remove SEGCBLIST_RCU_CORE Frederic Weisbecker
2024-05-30 13:45 ` [PATCH 10/11] rcu/nocb: Remove SEGCBLIST_KTHREAD_CB Frederic Weisbecker
2024-05-30 13:45 ` [PATCH 11/11] rcu/nocb: Simplify (de-)offloading state machine Frederic Weisbecker
2024-07-02 23:19 ` Boqun Feng
2024-07-03 12:17 ` Frederic Weisbecker
2024-07-03 22:56 ` [PATCH 11/11 v2] " Frederic Weisbecker
2024-07-03 23:52 ` Paul E. McKenney
2024-07-19 17:30 ` [PATCH 00/11] rcu/nocb: (De-)offloading on offline CPUs Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240530134552.5467-4-frederic@kernel.org \
--to=frederic@kernel.org \
--cc=boqun.feng@gmail.com \
--cc=joel@joelfernandes.org \
--cc=linux-kernel@vger.kernel.org \
--cc=neeraj.upadhyay@amd.com \
--cc=paulmck@kernel.org \
--cc=qiang.zhang1211@gmail.com \
--cc=rcu@vger.kernel.org \
--cc=urezki@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox