From: Joel Fernandes <joel@joelfernandes.org>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: linux-kernel@vger.kernel.org,
Josh Triplett <josh@joshtriplett.org>,
Lai Jiangshan <jiangshanlai@gmail.com>,
Marco Elver <elver@google.com>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
rcu@vger.kernel.org, Steven Rostedt <rostedt@goodmis.org>,
"Uladzislau Rezki (Sony)" <urezki@gmail.com>,
fweisbec@gmail.com, neeraj.iitr10@gmail.com
Subject: Re: [PATCH v9 7/7] rcu/segcblist: Add additional comments to explain smp_mb()
Date: Fri, 6 Nov 2020 17:41:41 -0500 [thread overview]
Message-ID: <20201106224141.GA1397669@google.com> (raw)
In-Reply-To: <20201105185551.GO3249@paulmck-ThinkPad-P72>
On Thu, Nov 05, 2020 at 10:55:51AM -0800, Paul E. McKenney wrote:
> On Tue, Nov 03, 2020 at 09:26:03AM -0500, Joel Fernandes (Google) wrote:
> > Memory barriers are needed when updating the full length of the
> > segcblist, however it is not fully clearly why one is needed before and
> > after. This patch therefore adds additional comments to the function
> > header to explain it.
> >
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
>
> Looks good, thank you! As always, I could not resist the urge to
> do a bit of wordsmithing, so that the queued commit is as shown
> below. Please let me know if I messed anything up.
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> commit 7dac7adefcae7558b3a85a16f51186d621623733
> Author: Joel Fernandes (Google) <joel@joelfernandes.org>
> Date: Tue Nov 3 09:26:03 2020 -0500
>
> rcu/segcblist: Add additional comments to explain smp_mb()
>
> One counter-intuitive property of RCU is the fact that full memory
> barriers are needed both before and after updates to the full
> (non-segmented) length. This patch therefore helps to assist the
> reader's intuition by adding appropriate comments.
>
> [ paulmck: Wordsmithing. ]
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
>
> diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> index bb246d8..b6dda7c 100644
> --- a/kernel/rcu/rcu_segcblist.c
> +++ b/kernel/rcu/rcu_segcblist.c
> @@ -94,17 +94,77 @@ static void rcu_segcblist_set_len(struct rcu_segcblist *rsclp, long v)
> * field to disagree with the actual number of callbacks on the structure.
> * This increase is fully ordered with respect to the callers accesses
> * both before and after.
> + *
> + * So why on earth is a memory barrier required both before and after
> + * the update to the ->len field???
> + *
> + * The reason is that rcu_barrier() locklessly samples each CPU's ->len
> + * field, and if a given CPU's field is zero, avoids IPIing that CPU.
> + * This can of course race with both queuing and invoking of callbacks.
> + * Failng to correctly handle either of these races could result in
> + * rcu_barrier() failing to IPI a CPU that actually had callbacks queued
> + * which rcu_barrier() was obligated to wait on. And if rcu_barrier()
> + * failed to wait on such a callback, unloading certain kernel modules
> + * would result in calls to functions whose code was no longer present in
> + * the kernel, for but one example.
> + *
> + * Therefore, ->len transitions from 1->0 and 0->1 have to be carefully
> + * ordered with respect with both list modifications and the rcu_barrier().
> + *
> + * The queuing case is CASE 1 and the invoking case is CASE 2.
> + *
> + * CASE 1: Suppose that CPU 0 has no callbacks queued, but invokes
> + * call_rcu() just as CPU 1 invokes rcu_barrier(). CPU 0's ->len field
> + * will transition from 0->1, which is one of the transitions that must be
> + * handled carefully. Without the full memory barriers before the ->len
> + * update and at the beginning of rcu_barrier(), the following could happen:
> + *
> + * CPU 0 CPU 1
> + *
> + * call_rcu().
> + * rcu_barrier() sees ->len as 0.
> + * set ->len = 1.
> + * rcu_barrier() does nothing.
> + * module is unloaded.
> + * callback invokes unloaded function!
> + *
> + * With the full barriers, any case where rcu_barrier() sees ->len as 0 will
> + * have unambiguously preceded the return from the racing call_rcu(), which
> + * means that this call_rcu() invocation is OK to not wait on. After all,
> + * you are supposed to make sure that any problematic call_rcu() invocations
> + * happen before the rcu_barrier().
Unfortunately, I did not understand your explanation. To me the barrier
*before* the setting of length is needed on CPU0 only for 1->0 transition
(Dequeue). Where as in
your example above, it is for enqueue.
This was case 1 in my patch:
+ * To illustrate the problematic scenario to avoid:
+ * P0 (what P1 sees) P1
+ * set len = 0
+ * rcu_barrier sees len as 0
+ * dequeue from list
+ * rcu_barrier does nothing.
+ *
Here, P1 should see the transition of 1->0 *after* the CB is dequeued. Which
means you needed a memory barrier *before* the setting of len from 1->0 and
*after* the dequeue. IOW, rcu_barrier should 'see' the memory ordering as:
1. dequeue
2. set len from 1 -> 0.
For the enqueue case, it is the reverse, rcu_barrier should see:
1. set len from 0 -> 1
2. enqueue
Either way, the point I think I was trying to make is that the length should
always be seen as non-zero if the list is non-empty. Basically, the
rcu_barrier() should always not do the fast-path if the list is non-empty.
Worst-case it might do the slow-path when it is not necessary, but it should
never do the fast-path when it was not supposed to.
Thoughts?
thanks,
- Joel
> + *
> + *
> + * CASE 2: Suppose that CPU 0 is invoking its last callback just as CPU 1 invokes
> + * rcu_barrier(). CPU 0's ->len field will transition from 1->0, which is one
> + * of the transitions that must be handled carefully. Without the full memory
> + * barriers after the ->len update and at the end of rcu_barrier(), the following
> + * could happen:
> + *
> + * CPU 0 CPU 1
> + *
> + * start invoking last callback
> + * set ->len = 0 (reordered)
> + * rcu_barrier() sees ->len as 0
> + * rcu_barrier() does nothing.
> + * module is unloaded
> + * callback executing after unloaded!
> + *
> + * With the full barriers, any case where rcu_barrier() sees ->len as 0
> + * will be fully ordered after the completion of the callback function,
> + * so that the module unloading operation is completely safe.
> + *
> */
> void rcu_segcblist_add_len(struct rcu_segcblist *rsclp, long v)
> {
> #ifdef CONFIG_RCU_NOCB_CPU
> - smp_mb__before_atomic(); /* Up to the caller! */
> + smp_mb__before_atomic(); // Read header comment above.
> atomic_long_add(v, &rsclp->len);
> - smp_mb__after_atomic(); /* Up to the caller! */
> + smp_mb__after_atomic(); // Read header comment above.
> #else
> - smp_mb(); /* Up to the caller! */
> + smp_mb(); // Read header comment above.
> WRITE_ONCE(rsclp->len, rsclp->len + v);
> - smp_mb(); /* Up to the caller! */
> + smp_mb(); // Read header comment above.
> #endif
> }
>
next prev parent reply other threads:[~2020-11-06 22:41 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-03 14:25 [PATCH v9 0/7] Add support for length of each segment in the segcblist Joel Fernandes (Google)
2020-11-03 14:25 ` [PATCH v9 1/7] rcu/tree: Make rcu_do_batch count how many callbacks were executed Joel Fernandes (Google)
2020-11-04 0:05 ` Paul E. McKenney
2020-11-04 15:09 ` Joel Fernandes
2020-11-03 14:25 ` [PATCH v9 2/7] rcu/segcblist: Add counters to segcblist datastructure Joel Fernandes (Google)
2020-11-04 13:37 ` Neeraj Upadhyay
2020-11-04 17:01 ` Paul E. McKenney
2020-11-07 0:01 ` Joel Fernandes
2020-11-07 0:18 ` Joel Fernandes
2020-11-07 0:48 ` Paul E. McKenney
2020-11-10 1:39 ` Paul E. McKenney
2020-11-03 14:25 ` [PATCH v9 3/7] srcu: Fix invoke_rcu_callbacks() segcb length adjustment Joel Fernandes (Google)
2020-11-03 14:47 ` Frederic Weisbecker
2020-11-03 14:56 ` Joel Fernandes
2020-11-03 15:07 ` Joel Fernandes
2020-11-03 15:18 ` Paul E. McKenney
2020-11-03 15:19 ` Frederic Weisbecker
2020-11-04 13:37 ` Neeraj Upadhyay
2020-11-03 14:26 ` [PATCH v9 4/7] rcu/trace: Add tracing for how segcb list changes Joel Fernandes (Google)
2020-11-03 15:17 ` Frederic Weisbecker
2020-11-04 14:08 ` Paul E. McKenney
2020-11-04 14:33 ` Frederic Weisbecker
2020-11-07 0:05 ` Joel Fernandes
2020-11-04 13:40 ` Neeraj Upadhyay
2020-11-04 15:05 ` Joel Fernandes
2020-11-11 0:35 ` Paul E. McKenney
2020-11-11 13:49 ` Steven Rostedt
2020-11-11 14:08 ` Joel Fernandes
2020-11-03 14:26 ` [PATCH v9 5/7] rcu/segcblist: Remove useless rcupdate.h include Joel Fernandes (Google)
2020-11-05 3:48 ` Paul E. McKenney
2020-11-05 14:28 ` Paul E. McKenney
2020-11-07 0:27 ` Joel Fernandes
2020-11-03 14:26 ` [PATCH v9 6/7] rcu/tree: segcblist: Remove redundant smp_mb()s Joel Fernandes (Google)
2020-11-05 3:57 ` Paul E. McKenney
2020-11-07 0:26 ` Joel Fernandes
2020-11-10 1:41 ` Paul E. McKenney
2020-11-03 14:26 ` [PATCH v9 7/7] rcu/segcblist: Add additional comments to explain smp_mb() Joel Fernandes (Google)
2020-11-05 18:55 ` Paul E. McKenney
2020-11-06 22:41 ` Joel Fernandes [this message]
2020-11-10 1:28 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201106224141.GA1397669@google.com \
--to=joel@joelfernandes.org \
--cc=elver@google.com \
--cc=fweisbec@gmail.com \
--cc=jiangshanlai@gmail.com \
--cc=josh@joshtriplett.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=neeraj.iitr10@gmail.com \
--cc=paulmck@kernel.org \
--cc=rcu@vger.kernel.org \
--cc=rostedt@goodmis.org \
--cc=urezki@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.