From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Joel Fernandes <joel@joelfernandes.org>
Cc: linux-kernel@vger.kernel.org, mingo@kernel.org,
jiangshanlai@gmail.com, dipankar@in.ibm.com,
akpm@linux-foundation.org, mathieu.desnoyers@efficios.com,
josh@joshtriplett.org, tglx@linutronix.de, peterz@infradead.org,
rostedt@goodmis.org, dhowells@redhat.com, edumazet@google.com,
fweisbec@gmail.com, oleg@redhat.com, joel.opensrc@gmail.com,
torvalds@linux-foundation.org, npiggin@gmail.com
Subject: Re: [tip/core/rcu,16/21] rcu: Add funnel locking to rcu_start_this_gp()
Date: Sat, 12 May 2018 07:44:38 -0700 [thread overview]
Message-ID: <20180512144438.GA12826@linux.vnet.ibm.com> (raw)
In-Reply-To: <20180512144002.GI26088@linux.vnet.ibm.com>
On Sat, May 12, 2018 at 07:40:02AM -0700, Paul E. McKenney wrote:
> On Fri, May 11, 2018 at 11:03:25PM -0700, Joel Fernandes wrote:
> > On Sun, Apr 22, 2018 at 08:03:39PM -0700, Paul E. McKenney wrote:
> > > The rcu_start_this_gp() function had a simple form of funnel locking that
> > > used only the leaves and root of the rcu_node tree, which is fine for
> > > systems with only a few hundred CPUs, but sub-optimal for systems having
> > > thousands of CPUs. This commit therefore adds full-tree funnel locking.
> > >
> > > This variant of funnel locking is unusual in the following ways:
> > >
> > > 1. The leaf-level rcu_node structure's ->lock is held throughout.
> > > Other funnel-locking implementations drop the leaf-level lock
> > > before progressing to the next level of the tree.
> > >
> > > 2. Funnel locking can be started at the root, which is convenient
> > > for code that already holds the root rcu_node structure's ->lock.
> > > Other funnel-locking implementations start at the leaves.
> > >
> > > 3. If an rcu_node structure other than the initial one believes
> > > that a grace period is in progress, it is not necessary to
> > > go further up the tree. This is because grace-period cleanup
> > > scans the full tree, so that marking the need for a subsequent
> > > grace period anywhere in the tree suffices -- but only if
> > > a grace period is currently in progress.
> > >
> > > 4. It is possible that the RCU grace-period kthread has not yet
> > > started, and this case must be handled appropriately.
> > >
> > > However, the general approach of using a tree to control lock contention
> > > is still in place.
> > >
> > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > ---
> > > kernel/rcu/tree.c | 92 +++++++++++++++++++++----------------------------------
> > > 1 file changed, 35 insertions(+), 57 deletions(-)
> > >
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 94519c7d552f..d3c769502929 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -1682,74 +1682,52 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
> > > {
> > > bool ret = false;
> > > struct rcu_state *rsp = rdp->rsp;
> > > - struct rcu_node *rnp_root = rcu_get_root(rsp);
> > > -
> > > - raw_lockdep_assert_held_rcu_node(rnp);
> > > -
> > > - /* If the specified GP is already known needed, return to caller. */
> > > - trace_rcu_this_gp(rnp, rdp, c, TPS("Startleaf"));
> > > - if (need_future_gp_element(rnp, c)) {
> > > - trace_rcu_this_gp(rnp, rdp, c, TPS("Prestartleaf"));
> > > - goto out;
> > > - }
> > > + struct rcu_node *rnp_root;
> > >
> > > /*
> > > - * If this rcu_node structure believes that a grace period is in
> > > - * progress, then we must wait for the one following, which is in
> > > - * "c". Because our request will be noticed at the end of the
> > > - * current grace period, we don't need to explicitly start one.
> > > + * Use funnel locking to either acquire the root rcu_node
> > > + * structure's lock or bail out if the need for this grace period
> > > + * has already been recorded -- or has already started. If there
> > > + * is already a grace period in progress in a non-leaf node, no
> > > + * recording is needed because the end of the grace period will
> > > + * scan the leaf rcu_node structures. Note that rnp->lock must
> > > + * not be released.
> > > */
> > > - if (rnp->gpnum != rnp->completed) {
> > > - need_future_gp_element(rnp, c) = true;
> > > - trace_rcu_this_gp(rnp, rdp, c, TPS("Startedleaf"));
> > > - goto out;
> >
> > Referring to the above negative diff as [1] (which I wanted to refer to later
> > in this message..)
> >
> > > + raw_lockdep_assert_held_rcu_node(rnp);
> > > + trace_rcu_this_gp(rnp, rdp, c, TPS("Startleaf"));
> > > + for (rnp_root = rnp; 1; rnp_root = rnp_root->parent) {
> > > + if (rnp_root != rnp)
> > > + raw_spin_lock_rcu_node(rnp_root);
> > > + if (need_future_gp_element(rnp_root, c) ||
> > > + ULONG_CMP_GE(rnp_root->gpnum, c) ||
> > > + (rnp != rnp_root &&
> > > + rnp_root->gpnum != rnp_root->completed)) {
> > > + trace_rcu_this_gp(rnp_root, rdp, c, TPS("Prestarted"));
> > > + goto unlock_out;
> >
> > I was a bit confused about the implementation of the above for loop:
> >
> > In the previous code (which I refer to in the negative diff [1]), we were
> > checking the leaf, and if the leaf believed that RCU was not idle, then we
> > were marking the need for the future GP and quitting this function. In the
> > new code, it seems like even if the leaf believes RCU is not-idle, we still
> > go all the way up the tree.
> >
> > I think the big change is, in the above new for loop, we either bail of if a
> > future GP need was already marked by an intermediate node, or we go marking
> > up the whole tree about the need for one.
> >
> > If a leaf believes RCU is not idle, can we not just mark the future GP need
> > like before and return? It seems we would otherwise increase the lock
> > contention since now we lock intermediate nodes and then finally even the
> > root. Where as before we were not doing that if the leaf believed RCU was not
> > idle.
> >
> > I am sorry if I missed something obvious.
>
> The trick is that we do the check before we have done the marking.
> So if we bailed, we would not have marked at all. If we are at an
> intermediate node and a grace period is in progress, we do bail.
>
> You are right that this means that we (perhaps unnecessarily) acquire
> the lock of the parent rcu_node, which might or might not be the root.
> And on systems with default fanout with 1024 CPUs or fewer, yes, it will
> be the root, and yes, this is the common case. So might be well worth
> improving.
>
> One way to implement the old mark-and-return approach as you suggest
> above would be as shown below (untested, probably doesn't build, and
> against current rcu/dev). What do you think?
>
> > The other thing is we now don't have the 'Startedleaf' trace like we did
> > before. I sent a patch to remove it, but I think the removal of that is
> > somehow connected to what I was just talking about.. and I was thinking if we
> > should really remove it. Should we add the case for checking leaves only back
> > or is that a bad thing to do?
>
> Suppose I got hit by a bus and you were stuck with the job of debugging
> this. What traces would you want and where would they be? Keeping in
> mind that too-frequent traces have their problems as well.
>
> (Yes, I will be trying very hard to avoid this scenario for as long as
> I can, but this might be a good way for you (and everyone else) to be
> thinking about this.)
>
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 1abe29a43944..abf3195e01dc 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1585,6 +1585,8 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
> goto unlock_out;
> }
> rnp_root->gp_seq_needed = c;
> + if (rcu_seq_state(rcu_seq_current(&rnp_root->gp_seq)))
Right... Make that rnp->gp_seq. Memory locality and all that...
Thanx, Paul
> + goto unlock_out;
> if (rnp_root != rnp && rnp_root->parent != NULL)
> raw_spin_unlock_rcu_node(rnp_root);
> if (!rnp_root->parent)
next prev parent reply other threads:[~2018-05-12 14:43 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-23 3:02 [PATCH tip/core/rcu 0/21] Contention reduction for v4.18 Paul E. McKenney
2018-04-23 3:03 ` [PATCH tip/core/rcu 01/21] rcu: Improve non-root rcu_cbs_completed() accuracy Paul E. McKenney
2018-04-23 3:03 ` [PATCH tip/core/rcu 02/21] rcu: Make rcu_start_future_gp()'s grace-period check more precise Paul E. McKenney
2018-04-23 3:03 ` [PATCH tip/core/rcu 03/21] rcu: Add accessor macros for the ->need_future_gp[] array Paul E. McKenney
2018-04-23 3:03 ` [PATCH tip/core/rcu 04/21] rcu: Make rcu_gp_kthread() check for early-boot activity Paul E. McKenney
2018-04-23 3:03 ` [PATCH tip/core/rcu 05/21] rcu: Make rcu_gp_cleanup() more accurately predict need for new GP Paul E. McKenney
2018-05-10 7:21 ` [tip/core/rcu, " Joel Fernandes
2018-05-10 13:15 ` Paul E. McKenney
2018-05-10 17:22 ` Joel Fernandes
2018-05-11 16:22 ` Paul E. McKenney
2018-05-10 17:37 ` Joel Fernandes
2018-05-11 16:24 ` Paul E. McKenney
2018-05-11 16:27 ` Joel Fernandes
2018-04-23 3:03 ` [PATCH tip/core/rcu 06/21] rcu: Avoid losing ->need_future_gp[] values due to GP start/end races Paul E. McKenney
2018-04-23 3:03 ` [PATCH tip/core/rcu 07/21] rcu: Make rcu_future_needs_gp() check all ->need_future_gps[] elements Paul E. McKenney
2018-04-23 3:03 ` [PATCH tip/core/rcu 08/21] rcu: Convert ->need_future_gp[] array to boolean Paul E. McKenney
2018-04-23 3:03 ` [PATCH tip/core/rcu 09/21] rcu: Make rcu_migrate_callbacks wake GP kthread when needed Paul E. McKenney
2018-04-23 3:03 ` [PATCH tip/core/rcu 10/21] rcu: Avoid __call_rcu_core() root rcu_node ->lock acquisition Paul E. McKenney
2018-04-23 3:03 ` [PATCH tip/core/rcu 11/21] rcu: Switch __rcu_process_callbacks() to rcu_accelerate_cbs() Paul E. McKenney
2018-04-23 3:03 ` [PATCH tip/core/rcu 12/21] rcu: Cleanup, don't put ->completed into an int Paul E. McKenney
2018-04-23 3:03 ` [PATCH tip/core/rcu 13/21] rcu: Clear request other than RCU_GP_FLAG_INIT at GP end Paul E. McKenney
2018-04-23 3:03 ` [PATCH tip/core/rcu 14/21] rcu: Inline rcu_start_gp_advanced() into rcu_start_future_gp() Paul E. McKenney
2018-04-23 3:03 ` [PATCH tip/core/rcu 15/21] rcu: Make rcu_start_future_gp() caller select grace period Paul E. McKenney
2018-04-23 3:03 ` [PATCH tip/core/rcu 16/21] rcu: Add funnel locking to rcu_start_this_gp() Paul E. McKenney
2018-05-12 6:03 ` [tip/core/rcu,16/21] " Joel Fernandes
2018-05-12 14:40 ` Paul E. McKenney
2018-05-12 14:44 ` Paul E. McKenney [this message]
2018-05-12 23:53 ` Joel Fernandes
2018-05-13 15:38 ` Paul E. McKenney
2018-05-13 16:49 ` Joel Fernandes
2018-05-13 19:09 ` Paul E. McKenney
2018-05-13 19:51 ` Joel Fernandes
2018-05-14 2:22 ` Paul E. McKenney
2018-05-14 5:00 ` Joel Fernandes
2018-05-14 13:23 ` Paul E. McKenney
2018-04-23 3:03 ` [PATCH tip/core/rcu 17/21] rcu: Make rcu_start_this_gp() check for out-of-range requests Paul E. McKenney
2018-04-23 3:03 ` [PATCH tip/core/rcu 18/21] rcu: The rcu_gp_cleanup() function does not need cpu_needs_another_gp() Paul E. McKenney
2018-04-23 3:03 ` [PATCH tip/core/rcu 19/21] rcu: Simplify and inline cpu_needs_another_gp() Paul E. McKenney
2018-04-23 3:03 ` [PATCH tip/core/rcu 20/21] rcu: Drop early GP request check from rcu_gp_kthread() Paul E. McKenney
2018-04-23 3:03 ` [PATCH tip/core/rcu 21/21] rcu: Update list of rcu_future_grace_period() trace events Paul E. McKenney
2018-05-14 6:42 ` [PATCH tip/core/rcu 0/21] Contention reduction for v4.18 Nicholas Piggin
2018-05-14 16:09 ` Paul E. McKenney
2018-05-14 22:21 ` Nicholas Piggin
2018-05-14 22:42 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180512144438.GA12826@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=dhowells@redhat.com \
--cc=dipankar@in.ibm.com \
--cc=edumazet@google.com \
--cc=fweisbec@gmail.com \
--cc=jiangshanlai@gmail.com \
--cc=joel.opensrc@gmail.com \
--cc=joel@joelfernandes.org \
--cc=josh@joshtriplett.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mingo@kernel.org \
--cc=npiggin@gmail.com \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).