From: Frederic Weisbecker <frederic@kernel.org>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>,
Boqun Feng <boqun.feng@gmail.com>,
Joel Fernandes <joel@joelfernandes.org>,
Josh Triplett <josh@joshtriplett.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Neeraj Upadhyay <neeraj.upadhyay@amd.com>,
"Paul E . McKenney" <paulmck@kernel.org>,
Steven Rostedt <rostedt@goodmis.org>,
Uladzislau Rezki <urezki@gmail.com>, rcu <rcu@vger.kernel.org>,
Yong He <alexyonghe@tencent.com>,
Neeraj upadhyay <Neeraj.Upadhyay@amd.com>
Subject: [PATCH 17/18] srcu: Fix callbacks acceleration mishandling
Date: Fri, 13 Oct 2023 13:59:01 +0200 [thread overview]
Message-ID: <20231013115902.1059735-18-frederic@kernel.org> (raw)
In-Reply-To: <20231013115902.1059735-1-frederic@kernel.org>
SRCU callbacks acceleration might fail if the preceding callbacks
advance also fails. This can happen when the following steps are met:
1) The RCU_WAIT_TAIL segment has callbacks (say for gp_num 8) and the
RCU_NEXT_READY_TAIL also has callbacks (say for gp_num 12).
2) The grace period for RCU_WAIT_TAIL is observed as started but not yet
completed so rcu_seq_current() returns 4 + SRCU_STATE_SCAN1 = 5.
3) This value is passed to rcu_segcblist_advance() which can't move
any segment forward and fails.
4) srcu_gp_start_if_needed() still proceeds with callback acceleration.
But then the call to rcu_seq_snap() observes the grace period for the
RCU_WAIT_TAIL segment (gp_num 8) as completed and the subsequent one
for the RCU_NEXT_READY_TAIL segment as started
(ie: 8 + SRCU_STATE_SCAN1 = 9) so it returns a snapshot of the
next grace period, which is 16.
5) The value of 16 is passed to rcu_segcblist_accelerate() but the
freshly enqueued callback in RCU_NEXT_TAIL can't move to
RCU_NEXT_READY_TAIL which already has callbacks for a previous grace
period (gp_num = 12). So acceleration fails.
6) Note in all these steps, srcu_invoke_callbacks() hadn't had a chance
to run srcu_invoke_callbacks().
Then some very bad outcome may happen if the following happens:
7) Some other CPU races and starts the grace period number 16 before the
CPU handling previous steps had a chance. Therefore srcu_gp_start()
isn't called on the latter sdp to fix the acceleration leak from
previous steps with a new pair of call to advance/accelerate.
8) The grace period 16 completes and srcu_invoke_callbacks() is finally
called. All the callbacks from previous grace periods (8 and 12) are
correctly advanced and executed but callbacks in RCU_NEXT_READY_TAIL
still remain. Then rcu_segcblist_accelerate() is called with a
snaphot of 20.
9) Since nothing started the grace period number 20, callbacks stay
unhandled.
This has been reported in real load:
[3144162.608392] INFO: task kworker/136:12:252684 blocked for more
than 122 seconds.
[3144162.615986] Tainted: G O K 5.4.203-1-tlinux4-0011.1 #1
[3144162.623053] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[3144162.631162] kworker/136:12 D 0 252684 2 0x90004000
[3144162.631189] Workqueue: kvm-irqfd-cleanup irqfd_shutdown [kvm]
[3144162.631192] Call Trace:
[3144162.631202] __schedule+0x2ee/0x660
[3144162.631206] schedule+0x33/0xa0
[3144162.631209] schedule_timeout+0x1c4/0x340
[3144162.631214] ? update_load_avg+0x82/0x660
[3144162.631217] ? raw_spin_rq_lock_nested+0x1f/0x30
[3144162.631218] wait_for_completion+0x119/0x180
[3144162.631220] ? wake_up_q+0x80/0x80
[3144162.631224] __synchronize_srcu.part.19+0x81/0xb0
[3144162.631226] ? __bpf_trace_rcu_utilization+0x10/0x10
[3144162.631227] synchronize_srcu+0x5f/0xc0
[3144162.631236] irqfd_shutdown+0x3c/0xb0 [kvm]
[3144162.631239] ? __schedule+0x2f6/0x660
[3144162.631243] process_one_work+0x19a/0x3a0
[3144162.631244] worker_thread+0x37/0x3a0
[3144162.631247] kthread+0x117/0x140
[3144162.631247] ? process_one_work+0x3a0/0x3a0
[3144162.631248] ? __kthread_cancel_work+0x40/0x40
[3144162.631250] ret_from_fork+0x1f/0x30
Fix this with taking the snapshot for acceleration _before_ the read
of the current grace period number.
The only side effect of this solution is that callbacks advancing happen
then _after_ the full barrier in rcu_seq_snap(). This is not a problem
because that barrier only cares about:
1) Ordering accesses of the update side before call_srcu() so they don't
bleed.
2) See all the accesses prior to the grace period of the current gp_num
The only things callbacks advancing need to be ordered against are
carried by snp locking.
Reported-by: Yong He <alexyonghe@tencent.com>
Co-developed-by:: Yong He <alexyonghe@tencent.com>
Signed-off-by: Yong He <alexyonghe@tencent.com>
Co-developed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Co-developed-by: Neeraj upadhyay <Neeraj.Upadhyay@amd.com>
Signed-off-by: Neeraj upadhyay <Neeraj.Upadhyay@amd.com>
Link: http://lore.kernel.org/CANZk6aR+CqZaqmMWrC2eRRPY12qAZnDZLwLnHZbNi=xXMB401g@mail.gmail.com
Fixes: da915ad5cf25 ("srcu: Parallelize callback handling")
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
kernel/rcu/srcutree.c | 33 ++++++++++++++++++++++++++++++---
1 file changed, 30 insertions(+), 3 deletions(-)
diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 5602042856b1..9fab9ac36996 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -1244,10 +1244,37 @@ static unsigned long srcu_gp_start_if_needed(struct srcu_struct *ssp,
spin_lock_irqsave_sdp_contention(sdp, &flags);
if (rhp)
rcu_segcblist_enqueue(&sdp->srcu_cblist, rhp);
- rcu_segcblist_advance(&sdp->srcu_cblist,
- rcu_seq_current(&ssp->srcu_sup->srcu_gp_seq));
+ /*
+ * The snapshot for acceleration must be taken _before_ the read of the
+ * current gp sequence used for advancing, otherwise advancing may fail
+ * and acceleration may then fail too.
+ *
+ * This could happen if:
+ *
+ * 1) The RCU_WAIT_TAIL segment has callbacks (gp_num = X + 4) and the
+ * RCU_NEXT_READY_TAIL also has callbacks (gp_num = X + 8).
+ *
+ * 2) The grace period for RCU_WAIT_TAIL is seen as started but not
+ * completed so rcu_seq_current() returns X + SRCU_STATE_SCAN1.
+ *
+ * 3) This value is passed to rcu_segcblist_advance() which can't move
+ * any segment forward and fails.
+ *
+ * 4) srcu_gp_start_if_needed() still proceeds with callback acceleration.
+ * But then the call to rcu_seq_snap() observes the grace period for the
+ * RCU_WAIT_TAIL segment as completed and the subsequent one for the
+ * RCU_NEXT_READY_TAIL segment as started (ie: X + 4 + SRCU_STATE_SCAN1)
+ * so it returns a snapshot of the next grace period, which is X + 12.
+ *
+ * 5) The value of X + 12 is passed to rcu_segcblist_accelerate() but the
+ * freshly enqueued callback in RCU_NEXT_TAIL can't move to
+ * RCU_NEXT_READY_TAIL which already has callbacks for a previous grace
+ * period (gp_num = X + 8). So acceleration fails.
+ */
s = rcu_seq_snap(&ssp->srcu_sup->srcu_gp_seq);
- (void)rcu_segcblist_accelerate(&sdp->srcu_cblist, s);
+ rcu_segcblist_advance(&sdp->srcu_cblist,
+ rcu_seq_current(&ssp->srcu_sup->srcu_gp_seq));
+ WARN_ON_ONCE(!rcu_segcblist_accelerate(&sdp->srcu_cblist, s) && rhp);
if (ULONG_CMP_LT(sdp->srcu_gp_seq_needed, s)) {
sdp->srcu_gp_seq_needed = s;
needgp = true;
--
2.34.1
next prev parent reply other threads:[~2023-10-13 12:00 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-13 11:58 [PATCH 00/18] RCU fixes for v6.7 Frederic Weisbecker
2023-10-13 11:58 ` [PATCH 01/18] Revert "checkpatch: Error out if deprecated RCU API used" Frederic Weisbecker
2023-10-13 11:58 ` [PATCH 02/18] srcu: Fix error handling in init_srcu_struct_fields() Frederic Weisbecker
2023-10-13 11:58 ` [PATCH 03/18] rcu/tree: Remove superfluous return from void call_rcu* functions Frederic Weisbecker
2023-10-13 11:58 ` [PATCH 04/18] rcu: Add sysfs to provide throttled access to rcu_barrier() Frederic Weisbecker
2023-10-13 11:58 ` [PATCH 05/18] rcu: Remove unused function declaration rcu_eqs_special_set() Frederic Weisbecker
2023-10-13 11:58 ` [PATCH 06/18] mm: Remove kmem_valid_obj() Frederic Weisbecker
2023-10-13 11:58 ` [PATCH 07/18] rcu: Dump memory object info if callback function is invalid Frederic Weisbecker
2023-10-13 11:58 ` [PATCH 08/18] rcu: Eliminate rcu_gp_slow_unregister() false positive Frederic Weisbecker
2023-10-13 11:58 ` [PATCH 09/18] srcu: Fix srcu_struct node grpmask overflow on 64-bit systems Frederic Weisbecker
2023-10-13 12:54 ` David Laight
2023-10-13 14:11 ` Frederic Weisbecker
2023-10-13 11:58 ` [PATCH 10/18] rcu: kmemleak: Ignore kmemleak false positives when RCU-freeing objects Frederic Weisbecker
2023-10-13 11:58 ` [PATCH 11/18] rcu: Use rcu_segcblist_segempty() instead of open coding it Frederic Weisbecker
2023-10-13 11:58 ` [PATCH 12/18] rcu: Assume IRQS disabled from rcu_report_dead() Frederic Weisbecker
2023-10-13 11:58 ` [PATCH 13/18] rcu: Assume rcu_report_dead() is always called locally Frederic Weisbecker
2023-10-13 11:58 ` [PATCH 14/18] rcu: Conditionally build CPU-hotplug teardown callbacks Frederic Weisbecker
2023-10-13 11:58 ` [PATCH 15/18] rcu: Standardize explicit CPU-hotplug calls Frederic Weisbecker
2023-10-13 11:59 ` [PATCH 16/18] rcu: Comment why callbacks migration can't wait for CPUHP_RCUTREE_PREP Frederic Weisbecker
2023-10-13 11:59 ` Frederic Weisbecker [this message]
2023-10-13 11:59 ` [PATCH 18/18] srcu: Only accelerate on enqueue time Frederic Weisbecker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231013115902.1059735-18-frederic@kernel.org \
--to=frederic@kernel.org \
--cc=alexyonghe@tencent.com \
--cc=boqun.feng@gmail.com \
--cc=joel@joelfernandes.org \
--cc=josh@joshtriplett.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=neeraj.upadhyay@amd.com \
--cc=paulmck@kernel.org \
--cc=rcu@vger.kernel.org \
--cc=rostedt@goodmis.org \
--cc=urezki@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.