From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Romanov Arya <romanov.arya@gmail.com>
Cc: Pranith Kumar <bobby.prani@gmail.com>,
Josh Triplett <josh@joshtriplett.org>,
LKML <linux-kernel@vger.kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
torvalds@linux-foundation.org, Waiman.Long@hp.com
Subject: Re: [RFC PATCH 1/1] kernel/rcu/tree.c: simplify force_quiescent_state()
Date: Tue, 17 Jun 2014 10:10:01 -0700 [thread overview]
Message-ID: <20140617171001.GG4669@linux.vnet.ibm.com> (raw)
In-Reply-To: <CAJKcOGdUDwdmdkpd0dw5xO9+TK=9W9fvF9Q+f5F_b4R8cEgdTA@mail.gmail.com>
On Tue, Jun 17, 2014 at 12:01:28PM -0400, Romanov Arya wrote:
> On Tue, Jun 17, 2014 at 10:54 AM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > On Mon, Jun 16, 2014 at 10:55:29PM -0400, Pranith Kumar wrote:
> >> This might sound really naive, but please bear with me.
> >>
> >> force_quiescent_state() used to do a lot of things in the past in addition to
> >> forcing a quiescent state. (In my reading of the mailing list I found state
> >> transitions for one).
> >>
> >> Now according to the code, what is being done is multiple callers try to go up
> >> the hierarchy of nodes to see who reaches the root node. The caller reaching the
> >> root node wins and it acquires root node lock and it gets to set rsp->gp_flags!
> >>
> >> At each level of the hierarchy we try to acquire fqslock. This is the only place
> >> which actually uses fqslock.
> >>
> >> I guess this was being done to avoid the contention on fqslock, but all we are
> >> doing here is setting one flag. This way of acquiring locks might reduce
> >> contention if every update is trying to do some independent work, but here all
> >> we are doing is setting the same flag with same value.
> >
> > Actually, to reduce contention on rnp_root->lock.
> >
> > The trick is that the "losers" at each level of ->fqslock acquisition go
> > away. The "winner" ends up doing the real work of setting RCU_GP_FLAG_FQS.
> >
> >> We can also remove fqslock completely if we do not need this. Also using
> >> cmpxchg() to set the value of the flag looks like a good idea to avoid taking
> >> the root node lock. Thoughts?
> >
> > The ->fqslock funnel was needed to avoid lockups on large systems (many
> > hundreds or even thousands of CPUs). Moving grace-period responsibilities
> > from softirq to the grace-period kthreads might have reduced contention
> > sufficienty to make the ->fqslock funnel unnecessary. However, given
> > that I don't usually have access to such a large system, I will leave it,
> > at least for the time being.
>
> Sounds like a good case study for using the newly introduced MCS based
> locks(qspinlock.h).
> Waiman, Peter?
No. Absolutely not.
Any exclusive lock, MCS or otherwise, will waste a huge amount of time
in this case.
> Btw, is doing the following a bad idea? It reduces contention on
> rnp_root->lock using fqslock
> which seems to be the lock which needs to be taken while forcing a
> quiescent state:
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index f1ba773..f5a0e7e 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2401,34 +2401,24 @@ static void force_quiescent_state(struct rcu_state *rsp)
> unsigned long flags;
> bool ret;
> struct rcu_node *rnp;
> - struct rcu_node *rnp_old = NULL;
> -
> - /* Funnel through hierarchy to reduce memory contention. */
> - rnp = per_cpu_ptr(rsp->rda, raw_smp_processor_id())->mynode;
> - for (; rnp != NULL; rnp = rnp->parent) {
> - ret = (ACCESS_ONCE(rsp->gp_flags) & RCU_GP_FLAG_FQS) ||
> - !raw_spin_trylock(&rnp->fqslock);
> - if (rnp_old != NULL)
> - raw_spin_unlock(&rnp_old->fqslock);
> - if (ret) {
> - ACCESS_ONCE(rsp->n_force_qs_lh)++;
> - return;
> - }
> - rnp_old = rnp;
> + struct rcu_node *rnp_root = rcu_get_root(rsp);
> +
> + if (!raw_spin_trylock(rnp_root->fqslock)) {
This will be an epic fail on huge systems because it can result in massive
memory contention. And yes, I have recently reduced the probability of
large numbers of concurrent calls to force_quiescent_state(), but this
situation really can still happen, for example, if a bunch of CPUs are
doing lots of concurrent call_rcu() invocations.
Pranith, Romanov: You do -not-, repeat -not-, get to shoot from the hip
with this code. You absolutely need to understand what it is doing and
why before you try hacking on it. Otherwise, all that will happen is
that you will come up with additional creative ways to break RCU. Your
commit log and comments will need to clearly indicate that you understand
what happens if (say) 4096 CPUs all call force_quiescent_state() at the
same time.
Thanx, Paul
> + ACCESS_ONCE(rsp->n_force_qs_lh)++;
> + return; /* Someone is already trying to force */
> }
> - /* rnp_old == rcu_get_root(rsp), rnp == NULL. */
>
> - /* Reached the root of the rcu_node tree, acquire lock. */
> - raw_spin_lock_irqsave(&rnp_old->lock, flags);
> - smp_mb__after_unlock_lock();
> - raw_spin_unlock(&rnp_old->fqslock);
> if (ACCESS_ONCE(rsp->gp_flags) & RCU_GP_FLAG_FQS) {
> ACCESS_ONCE(rsp->n_force_qs_lh)++;
> - raw_spin_unlock_irqrestore(&rnp_old->lock, flags);
> + raw_spin_unlock(rnp_root->fqslock);
> return; /* Someone beat us to it. */
> }
> +
> + /* Reached the root of the rcu_node tree, acquire lock. */
> + raw_spin_lock_irqsave(&rnp_root->lock, flags);
> + smp_mb__after_unlock_lock();
> ACCESS_ONCE(rsp->gp_flags) |= RCU_GP_FLAG_FQS;
> - raw_spin_unlock_irqrestore(&rnp_old->lock, flags);
> + raw_spin_unlock_irqrestore(&rnp_root->lock, flags);
> wake_up(&rsp->gp_wq); /* Memory barrier implied by wake_up() path. */
> }
>
> Regards,
> Romanov
>
> >
> > But you might be interested in thinking through what else would need to
> > change in order to make cmpxchg() work. ;-)
> >
> > Thanx, Paul
> >
> >> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
> >> ---
> >> kernel/rcu/tree.c | 35 +++++++++++++----------------------
> >> 1 file changed, 13 insertions(+), 22 deletions(-)
> >>
> >> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> >> index f1ba773..9a46f32 100644
> >> --- a/kernel/rcu/tree.c
> >> +++ b/kernel/rcu/tree.c
> >> @@ -2399,36 +2399,27 @@ static void force_qs_rnp(struct rcu_state *rsp,
> >> static void force_quiescent_state(struct rcu_state *rsp)
> >> {
> >> unsigned long flags;
> >> - bool ret;
> >> - struct rcu_node *rnp;
> >> - struct rcu_node *rnp_old = NULL;
> >> -
> >> - /* Funnel through hierarchy to reduce memory contention. */
> >> - rnp = per_cpu_ptr(rsp->rda, raw_smp_processor_id())->mynode;
> >> - for (; rnp != NULL; rnp = rnp->parent) {
> >> - ret = (ACCESS_ONCE(rsp->gp_flags) & RCU_GP_FLAG_FQS) ||
> >> - !raw_spin_trylock(&rnp->fqslock);
> >> - if (rnp_old != NULL)
> >> - raw_spin_unlock(&rnp_old->fqslock);
> >> - if (ret) {
> >> - ACCESS_ONCE(rsp->n_force_qs_lh)++;
> >> - return;
> >> - }
> >> - rnp_old = rnp;
> >> + struct rcu_node *rnp_root = rcu_get_root(rsp);
> >> +
> >> + /* early test to see if someone already forced a quiescent state
> >> + */
> >> + if (ACCESS_ONCE(rsp->gp_flags) & RCU_GP_FLAG_FQS) {
> >> + ACCESS_ONCE(rsp->n_force_qs_lh)++;
> >> + return; /* Someone beat us to it. */
> >> }
> >> - /* rnp_old == rcu_get_root(rsp), rnp == NULL. */
> >>
> >> /* Reached the root of the rcu_node tree, acquire lock. */
> >> - raw_spin_lock_irqsave(&rnp_old->lock, flags);
> >> + raw_spin_lock_irqsave(&rnp_root->lock, flags);
> >> smp_mb__after_unlock_lock();
> >> - raw_spin_unlock(&rnp_old->fqslock);
> >> if (ACCESS_ONCE(rsp->gp_flags) & RCU_GP_FLAG_FQS) {
> >> ACCESS_ONCE(rsp->n_force_qs_lh)++;
> >> - raw_spin_unlock_irqrestore(&rnp_old->lock, flags);
> >> - return; /* Someone beat us to it. */
> >> + raw_spin_unlock_irqrestore(&rnp_root->lock, flags);
> >> + return; /* Someone actually beat us to it. */
> >> }
> >> +
> >> + /* can we use cmpxchg instead of the above lock? */
> >> ACCESS_ONCE(rsp->gp_flags) |= RCU_GP_FLAG_FQS;
> >> - raw_spin_unlock_irqrestore(&rnp_old->lock, flags);
> >> + raw_spin_unlock_irqrestore(&rnp_root->lock, flags);
> >> wake_up(&rsp->gp_wq); /* Memory barrier implied by wake_up() path. */
> >> }
> >>
> >> --
> >> 1.9.1
> >>
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
>
next prev parent reply other threads:[~2014-06-17 17:10 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-17 2:55 [RFC PATCH 1/1] kernel/rcu/tree.c: simplify force_quiescent_state() Pranith Kumar
2014-06-17 14:54 ` Paul E. McKenney
2014-06-17 16:01 ` Romanov Arya
2014-06-17 16:56 ` Waiman Long
2014-06-17 17:11 ` Paul E. McKenney
2014-06-17 17:37 ` Paul E. McKenney
2014-06-17 20:06 ` Waiman Long
2014-06-23 10:28 ` Peter Zijlstra
2014-06-23 15:57 ` Paul E. McKenney
2014-06-23 17:33 ` Paul E. McKenney
2014-06-23 18:57 ` Peter Zijlstra
2014-06-23 19:05 ` Paul E. McKenney
2014-06-17 17:10 ` Paul E. McKenney [this message]
2014-06-17 18:22 ` Pranith Kumar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140617171001.GG4669@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=Waiman.Long@hp.com \
--cc=bobby.prani@gmail.com \
--cc=josh@joshtriplett.org \
--cc=linux-kernel@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=romanov.arya@gmail.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox