Re: [PATCH RFC tip/core/rcu 1/5] rcu: Reduce overhead of cond_resched() checks for RCU

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Oleg Nesterov <oleg@redhat.com>
Cc: linux-kernel@vger.kernel.org, mingo@kernel.org,
	laijs@cn.fujitsu.com, dipankar@in.ibm.com,
	akpm@linux-foundation.org, mathieu.desnoyers@efficios.com,
	josh@joshtriplett.org, niv@us.ibm.com, tglx@linutronix.de,
	peterz@infradead.org, rostedt@goodmis.org, dhowells@redhat.com,
	edumazet@google.com, dvhart@linux.intel.com, fweisbec@gmail.com,
	sbw@mit.edu, Andi Kleen <ak@linux.intel.com>,
	Christoph Lameter <cl@gentwo.org>,
	Mike Galbraith <umgwanakikbuti@gmail.com>,
	Eric Dumazet <eric.dumazet@gmail.com>
Subject: Re: [PATCH RFC tip/core/rcu 1/5] rcu: Reduce overhead of cond_resched() checks for RCU
Date: Mon, 23 Jun 2014 10:36:13 -0700	[thread overview]
Message-ID: <20140623173613.GI4603@linux.vnet.ibm.com> (raw)
In-Reply-To: <20140623164321.GA5543@redhat.com>

On Mon, Jun 23, 2014 at 06:43:21PM +0200, Oleg Nesterov wrote:
> On 06/20, Paul E. McKenney wrote:
> >
> > This commit takes a different approach to fixing this bug, mainly by
> > avoiding having cond_resched() do an RCU-visible quiescent state unless
> > there is a grace period that has been in flight for a significant period
> > of time.  This commit also reduces the common-case cond_resched() overhead
> > to a check of a single per-CPU variable.
> 
> I can't say I fully understand this change, but I think it is fine.
> Just a really stupid question below.
> 
> > +void rcu_resched(void)
> > +{
> > +	unsigned long flags;
> > +	struct rcu_data *rdp;
> > +	struct rcu_dynticks *rdtp;
> > +	int resched_mask;
> > +	struct rcu_state *rsp;
> > +
> > +	local_irq_save(flags);
> > +
> > +	/*
> > +	 * Yes, we can lose flag-setting operations.  This is OK, because
> > +	 * the flag will be set again after some delay.
> > +	 */
> > +	resched_mask = raw_cpu_read(rcu_cond_resched_mask);
> > +	raw_cpu_write(rcu_cond_resched_mask, 0);
> > +
> > +	/* Find the flavor that needs a quiescent state. */
> > +	for_each_rcu_flavor(rsp) {
> > +		rdp = raw_cpu_ptr(rsp->rda);
> > +		if (!(resched_mask & rsp->flavor_mask))
> > +			continue;
> > +		smp_mb(); /* ->flavor_mask before ->cond_resched_completed. */
> > +		if (ACCESS_ONCE(rdp->mynode->completed) !=
> > +		    ACCESS_ONCE(rdp->cond_resched_completed))
> > +			continue;
> 
> Probably the comment above mb() meant "rcu_cond_resched_mask before
> ->cond_resched_completed" ? Otherwise I can't see why do we need any
> barrier.

You are absolutely right, changed as suggested.

> > @@ -893,13 +946,20 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp,
> >  	}
> >
> >  	/*
> > -	 * There is a possibility that a CPU in adaptive-ticks state
> > -	 * might run in the kernel with the scheduling-clock tick disabled
> > -	 * for an extended time period.  Invoke rcu_kick_nohz_cpu() to
> > -	 * force the CPU to restart the scheduling-clock tick in this
> > -	 * CPU is in this state.
> > +	 * A CPU running for an extended time within the kernel can
> > +	 * delay RCU grace periods.  When the CPU is in NO_HZ_FULL mode,
> > +	 * even context-switching back and forth between a pair of
> > +	 * in-kernel CPU-bound tasks cannot advance grace periods.
> > +	 * So if the grace period is old enough, make the CPU pay attention.
> >  	 */
> > -	rcu_kick_nohz_cpu(rdp->cpu);
> > +	if (ULONG_CMP_GE(jiffies, rdp->rsp->gp_start + 7)) {
> > +		rcrmp = &per_cpu(rcu_cond_resched_mask, rdp->cpu);
> > +		ACCESS_ONCE(rdp->cond_resched_completed) =
> > +			ACCESS_ONCE(rdp->mynode->completed);
> > +		smp_mb(); /* ->cond_resched_completed before *rcrmp. */
> > +		ACCESS_ONCE(*rcrmp) =
> > +			ACCESS_ONCE(*rcrmp) + rdp->rsp->flavor_mask;
> > +	}
> 
> OK, in this case I guess we need a full barrier because we need to read
> rcu_cond_resched_mask before updating it...
> 
> But, I am just curious, is there any reason to use ACCESS_ONCE() twice?
> 
> 	ACCESS_ONCE(*rcrmp) |= rdp->rsp->flavor_mask;
> 
> or even
> 
> 	ACCESS_ONCE(per_cpu(rcu_cond_resched_mask, rdp->cpu)) |=
> 		rdp->rsp->flavor_mask;
> 
> should equally work, or ACCESS_ONCE() can't be used to RMW ?

It can be, but Linus doesn't like it to be.  I recently changed all of
the RMW ACCESS_ONCE() calls as a result.  One of the reasons for avoiding
RMW ACCESS_ONCE() is that language features that might one day replace
ACCESS_ONCE() do not support RMW use.

> (and in fact at least the 2nd ACCESS_ONCE() (load) looks unnecessary anyway
>  because of smp_mb() above).

It is unlikely, but without ACCESS_ONCE() some misbegotten compiler could
split the load and still claim to be conforming to the standard.  :-(
(This is called "load tearing" by the standards guys.)

> Once again, of course I am not arguing if there is no "real" reason and you
> just prefer it this way. But the kernel has more and more ACESS_ONCE() users
> and sometime I simply do not understand why it is needed. For example,
> cyc2ns_write_end().

Could be concern about store tearing.

> Or even INIT_LIST_HEAD_RCU(). The comment in list_splice_init_rcu() says:
> 
> 	/*
> 	 * "first" and "last" tracking list, so initialize it.  RCU readers
> 	 * have access to this list, so we must use INIT_LIST_HEAD_RCU()
> 	 * instead of INIT_LIST_HEAD().
> 	 */
> 
> 	INIT_LIST_HEAD_RCU(list);
> 
> but we are going to call synchronize_rcu() or something similar, this should
> act as compiler barrier too?

Indeed, synchronize_rcu() enforces a barrier on each CPU between
any prior and subsequent accesses to RCU-protected data by that CPU.
(Which means that CPUs that would otherwise sleep through the entire
grace period can continue sleeping, given that it is not accessing
any RCU-protected data while sleeping.)  I would guess load-tearing
or store-tearing concerns.


							Thanx, Paul

next prev parent reply	other threads:[~2014-06-23 17:36 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-20 18:32 [PATCH tip/core/rcu 0/5] Fix for cond_resched performance regression Paul E. McKenney
2014-06-20 18:33 ` [PATCH RFC tip/core/rcu 1/5] rcu: Reduce overhead of cond_resched() checks for RCU Paul E. McKenney
2014-06-20 18:33   ` [PATCH RFC tip/core/rcu 2/5] rcu: Provide cond_resched_rcu_qs() to force quiescent states in long loops Paul E. McKenney
2014-06-20 18:33   ` [PATCH RFC tip/core/rcu 3/5] rcu: Add RCU_COND_RESCHED_QS for large systems Paul E. McKenney
2014-06-20 18:33   ` [PATCH RFC tip/core/rcu 4/5] rcutorture: Suppress spurious RCU CPU stall warnings Paul E. McKenney
2014-06-20 18:33   ` [PATCH RFC tip/core/rcu 5/5] rcu: Add boot/sysfs control for RCU cond_resched() help solicitation Paul E. McKenney
2014-06-23 16:43   ` [PATCH RFC tip/core/rcu 1/5] rcu: Reduce overhead of cond_resched() checks for RCU Oleg Nesterov
2014-06-23 17:36     ` Paul E. McKenney [this message]
2014-06-23 18:35       ` Oleg Nesterov
2014-06-24  0:18         ` Paul E. McKenney
2014-07-22  4:35   ` Pranith Kumar
2014-07-22  4:52     ` Pranith Kumar
2014-07-22 11:07       ` Paul E. McKenney
2014-07-22 11:06     ` Paul E. McKenney
2014-06-20 19:04 ` [PATCH tip/core/rcu 0/5] Fix for cond_resched performance regression josh
2014-06-20 22:04   ` Paul E. McKenney
2014-06-20 19:12 ` Paul E. McKenney
2014-06-20 21:24   ` josh
2014-06-20 22:11     ` Paul E. McKenney
2014-06-20 22:39       ` josh
2014-06-20 23:30         ` Paul E. McKenney
2014-06-20 23:52           ` josh
2014-06-21  0:14             ` Paul E. McKenney
2014-06-21  0:36               ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140623173613.GI4603@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@gentwo.org \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=dvhart@linux.intel.com \
    --cc=edumazet@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=fweisbec@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@kernel.org \
    --cc=niv@us.ibm.com \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=sbw@mit.edu \
    --cc=tglx@linutronix.de \
    --cc=umgwanakikbuti@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.