From: Frederic Weisbecker <frederic@kernel.org>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>,
Boqun Feng <boqun.feng@gmail.com>,
Joel Fernandes <joel@joelfernandes.org>,
Neeraj Upadhyay <neeraj.upadhyay@amd.com>,
"Paul E . McKenney" <paulmck@kernel.org>,
Uladzislau Rezki <urezki@gmail.com>,
Zqiang <qiang.zhang1211@gmail.com>, rcu <rcu@vger.kernel.org>
Subject: [PATCH 03/11] rcu/nocb: Assert no callbacks while nocb kthread allocation fails
Date: Thu, 30 May 2024 15:45:44 +0200 [thread overview]
Message-ID: <20240530134552.5467-4-frederic@kernel.org> (raw)
In-Reply-To: <20240530134552.5467-1-frederic@kernel.org>
When a NOCB CPU fails to create a nocb kthread on bringup, the CPU is
then deoffloaded. The barrier mutex is locked at this stage. It is
typically used to protect against concurrent (de-)offloading and/or
concurrent rcu_barrier() that would otherwise risk a nocb locking
imbalance. However:
* rcu_barrier() can't run concurrently if it's the boot CPU on early
boot-up.
* rcu_barrier() can run concurrently if it's a secondary CPU but it is
expected to see 0 callbacks on this target because it's the first
time it boots.
* (de-)offloading can't happen concurrently with smp_init(), as
rcutorture is initialized later, at least not before device_initcall(),
and userspace isn't available yet.
* (de-)offloading can't happen concurrently with cpu_up(), courtesy of
cpu_hotplug_lock.
But:
* The lazy shrinker might run concurrently with cpu_up(). It shouldn't
try to grab the nocb_lock and risk an imbalance due to lazy_len
supposed to be 0 but be extra cautious.
* Also be cautious against resume from hibernation potential subtleties.
So keep the locking and add some assertions and comments.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
kernel/rcu/tree_nocb.h | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index f4112fc663a7..fdd0616f2fd1 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -1442,7 +1442,7 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu)
"rcuog/%d", rdp_gp->cpu);
if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo GP kthread, OOM is now expected behavior\n", __func__)) {
mutex_unlock(&rdp_gp->nocb_gp_kthread_mutex);
- goto end;
+ goto err;
}
WRITE_ONCE(rdp_gp->nocb_gp_kthread, t);
if (kthread_prio)
@@ -1454,7 +1454,7 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu)
t = kthread_create(rcu_nocb_cb_kthread, rdp,
"rcuo%c/%d", rcu_state.abbr, cpu);
if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo CB kthread, OOM is now expected behavior\n", __func__))
- goto end;
+ goto err;
if (rcu_rdp_is_offloaded(rdp))
wake_up_process(t);
@@ -1467,7 +1467,15 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu)
WRITE_ONCE(rdp->nocb_cb_kthread, t);
WRITE_ONCE(rdp->nocb_gp_kthread, rdp_gp->nocb_gp_kthread);
return;
-end:
+
+err:
+ /*
+ * No need to protect against concurrent rcu_barrier()
+ * because the number of callbacks should be 0 for a non-boot CPU,
+ * therefore rcu_barrier() shouldn't even try to grab the nocb_lock.
+ * But hold barrier_mutex to avoid nocb_lock imbalance from shrinker.
+ */
+ WARN_ON_ONCE(system_state > SYSTEM_BOOTING && rcu_segcblist_n_cbs(&rdp->cblist));
mutex_lock(&rcu_state.barrier_mutex);
if (rcu_rdp_is_offloaded(rdp)) {
rcu_nocb_rdp_deoffload(rdp);
--
2.45.1
next prev parent reply other threads:[~2024-05-30 13:46 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-30 13:45 [PATCH 00/11] rcu/nocb: (De-)offloading on offline CPUs Frederic Weisbecker
2024-05-30 13:45 ` [PATCH 01/11] rcu/nocb: Introduce RCU_NOCB_LOCKDEP_WARN() Frederic Weisbecker
2024-05-30 13:45 ` [PATCH 02/11] rcu/nocb: Move nocb field at the end of state struct Frederic Weisbecker
2024-05-30 13:45 ` Frederic Weisbecker [this message]
2024-05-30 13:45 ` [PATCH 04/11] rcu/nocb: Introduce nocb mutex Frederic Weisbecker
2024-05-30 13:45 ` [PATCH 05/11] rcu/nocb: (De-)offload callbacks on offline CPUs only Frederic Weisbecker
2024-05-30 13:45 ` [PATCH 06/11] rcu/nocb: Remove halfway (de-)offloading handling from bypass Frederic Weisbecker
2024-05-30 13:45 ` [PATCH 07/11] rcu/nocb: Remove halfway (de-)offloading handling from rcu_core()'s QS reporting Frederic Weisbecker
2024-05-30 13:45 ` [PATCH 08/11] rcu/nocb: Remove halfway (de-)offloading handling from rcu_core Frederic Weisbecker
2024-05-30 13:45 ` [PATCH 09/11] rcu/nocb: Remove SEGCBLIST_RCU_CORE Frederic Weisbecker
2024-05-30 13:45 ` [PATCH 10/11] rcu/nocb: Remove SEGCBLIST_KTHREAD_CB Frederic Weisbecker
2024-05-30 13:45 ` [PATCH 11/11] rcu/nocb: Simplify (de-)offloading state machine Frederic Weisbecker
2024-07-02 23:19 ` Boqun Feng
2024-07-03 12:17 ` Frederic Weisbecker
2024-07-03 22:56 ` [PATCH 11/11 v2] " Frederic Weisbecker
2024-07-03 23:52 ` Paul E. McKenney
2024-07-19 17:30 ` [PATCH 00/11] rcu/nocb: (De-)offloading on offline CPUs Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240530134552.5467-4-frederic@kernel.org \
--to=frederic@kernel.org \
--cc=boqun.feng@gmail.com \
--cc=joel@joelfernandes.org \
--cc=linux-kernel@vger.kernel.org \
--cc=neeraj.upadhyay@amd.com \
--cc=paulmck@kernel.org \
--cc=qiang.zhang1211@gmail.com \
--cc=rcu@vger.kernel.org \
--cc=urezki@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.